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A general  framework  for  discrete  time  stochastic  optimal  control  is 

proposed.  The  sequential  decision  model 

N-1  k 

minimize:  E { ^o°<  g ( xk » uk » wk > ^ » 
subject  to:  xk+1  = f(xk,uk,wk) , 

uk t U(xk) , k=0, 1 , . . . ,N-1 , 

is  treated,  where  wk  is  a random  disturbance  with  distribution  parameterized 
by  (xk,uk).  If  N is  finite,  define  and  consider  models  which  are  summable 
below  (F+)  or  summable  above  (F_).  If  N=+oo,  we  treat  the  cases: 

(P)  0 < « <.  1 , 0 <gi  +oo, 

(N)  0<«  < 1,  -oo<  g < 0, 

(D)  0 < a < 1 , -b  <_  g <.  b < +oo. 

The  minimization  problem  is  shown  to  be  well  posed  if  the  state,  control  and 
disturbance  spaces  are  Borel  spaces,  the  other  data  are  Borel  measurable,  and 
universally  measurable  policies  are  admitted.  Universally  measurable  policies 
which  are  e-optimal  for  every  initial  state  are  shown  to  exist  and  simple 
characterizations  of  optimal  policies  are  provided. 

In  particular,  we  have  under  the  indicated  conditions: 

(F+)  Anfi-optimal  nonrandomized  Markov  policy  exists  and  can  be  constructed  by 
the  dynamic  programming  algorithm. 


(F“)  An  £ -optimal  (randomized)  Markov  policy  and  an  e -optimal  nonrandomized 


semi-Markov  policy  exist.  If  {e.n } is  a sequence  of  positive  numbers  with 
6n10,  then  a sequence  of  nonrandomized  Markov  policies  exhibiting  *M£n} 
dominated  convergence  to  optimality"  exists. 


+ 

(F“)  If  the  infimum  in  the  dynamic  programming  algorithm  is  achieved  for  each 
state  at  each  stage,  then  an  optimal  nonrandomized  Markov  policy  exists. 

f 

(P)  An  £-optimal  nonrandomized  Markov  policy  exists.  If«*<1,  this  policy  can 
be  taken  to  be  stationary.  If  for  each  initial  state,  a policy  optimal 
at  that  state  exists,  then  a nonrandomized  stationary  policy  optimal  at 
every  initial  state  exists.  Such  a nonrandomized  stationary  optimal 
policy  exists  if  and  only  if  the  infimum  in  the  optimality  equation  is 
achieved  for  every  state.  Continuity  and  compactness  conditions  are 
given  under  which  the  dynamic  programming  algorithm  yields,  in  the  limit, 
the  optimal  cost  function  and  an  optimal  nonrandomized  stationary  policy. 

(N)  An  ^-optimal  nonrandomized  semi-Markov  policy  exists.  If  for  each 
initial  state,  a policy  optimal  at  that  state  exists,  then  a semi-Markov 
(randomized)  policy  optimal  at  every  initial  state  exists.  A stationary 
policy  is  optimal  if  and  only  if  its  associated  cost  function  is  a fixed 
point  of  the  dynamic  programming  operator.  The  dynamic  programming 
algorithm  yields,  in  the  limit,  the  optimal  cost  function. 

(D)  All  results  given  for  (P)  and  (N)  hold.  Sharp  bounds  on  the  rate  of 

convergence  of  the  dynamic  programming  algorithm  are  established. 

The  method  of  analysis  under  (P),  (N)  and  (D)  is  to  convert  the 

stochastic  model  to  an  equivalent  deterministic  one  and  apply  standard 


deterministic  results.  Finally,  it  is  shown  how  nonstationary  models  and 
models  with  imperfect  state  information  can  be  reduced  to  the  one  treated. 
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CHAPTER  1 
INTRODUCTION 


Section  1 . The  discrete  time  stochastic  decision  propiem. 

I tie  aiscrete  time  stocnastic  decision  moael  is  a matnematical  aDstraction 
oi  the  situation  in  '-nich  a system  progresses  trom  state  to  state  incurring  a 
cost  at  each  transition,  ihe  cost  could  oe  assigned  to  rei iect  tne  preference 
one  nas  tor  one  state  over  another  or  could  be  tne  genuine  cost  ot  say, 
operating  a Dusiness,  during  the  period  oi  transition.  A decision  maker  has 
some  inxluence  over  the  stocnastic  manner  ot  tne  transition  out  cannot  cnoose 
deterministically  the  state  into  wnicn  tne  system  wiil  move.  he  wisnes,  ot 
course,  to  exercise  his  influence  to  minimize  tne  total  expected  cost  ot  all 
transitions . 1 Thus  ne  must  not  only  take  into  account  the  cost  ot  tne  present 
transition,  but  rather  must  balance  his  desire  to  minimize  tnis  against  nis 
desire  to  avoid  moving  to  a state  where  a high  future  cost  is  unavoidable. 

A classical  example  ot  this  situation,  in  which  we  treat  profit  as 
negative  cost,  is  portfolio  management.  An  investor  must  balance  nis  desire 
to  acnieve  immediate  return,  possibly  in  the  form  of  dividends,  against  a 
desire  to  avoid  investments  in  areas  where  low  long-run  yield  is  probable,  n 
the  total  value  ot  the  portfolio  is  taken  as  tne  state  ot  tae  system,  tne 


^iiany  authors  speak  of  maximization  of  reward  rather  than  minimization  of 
cost,  we  follow  the  practice  ot  control  theorists.  Altnougn  this  discrepancy 
involves  only  a cnange  in  sign,  one  must  take  care  to  avoid  contusion.  ror 
example,  if  tne  cost  function  is  nonnegative,  we  have  a "positive  dynamic 
programming  model,"  which  is  treated  extensively  in  Strauch  LobJ,  where  it  is 
referred  to  as  a "negative  dynamic  programming  model." 


k. A 


stochastic  nature  of  the  problem  is  apparent. 

Other  examples  can  be  drawn  1 rom  inventory  management,  reservoir  control, 
sequential  analysis  (nypothesis  testing)  and,  by  discretizing  a continuous 
problem,  from  control  of  a large  variety  of  physical  systems  subject  to  random 
disturbances.  for  an  extensive  set  of  discrete  time  stochastic  decision 
models,  see  Bellman  L2J , Bertsekas  L3J,  Dynkin  and  Juskevic  l 1BJ , Howard  L 17 J , 
wald  L 37 J , and  the  references  contained  therein. 


If  a system  is  deterministic,  tnen  the  transition  from  state  to  state  can 
be  described  by  a system  equation 


where  xkI  x^+i  represent  a state  and  its  succeeding  state  and  will  oe  assumed 
to  belong  to  some  state  space  S;  uk  represents  a control  variable  cnosen  Dy 
the  decision  maker  in  some  constraint  set  U(xk)  which  is  in  turn  a subset  of 
some  control  space  C.  The  cost  incurred  by  sucn  a transition  can  be  given  Dy 
a function  g(XK,uk).  Actually  the  cost  could  depend  on  xK+1  as  well,  but  in 
view  of  ll.D  this  can  be  reduced  to  dependence  on  only  x.  and  u„. 

K K 


If  the  system  is  stochastic  we  include  a disturbance  wR  in  this 
description,  tiquation  (1.1)  is  replaced  by 


U-2)  xk+1  = f(xk,uk,wK), 

and  the  cost  per  stage  becomes  Skfxk,  uk,wk) . The  disturbance  wK  is  a member 
of  some  probability  space  (to,>)  and  has  distribution  p (awk ] xK , uR) . Thus  tne 
control  variable  uk  exercises  influence  over  the  transition  from  xk  to  xK+1  in 
two  places,  once  in  the  system  equation  (l.k)  and  again  as  a parameter  in  the 
distribution  of  the  disturbance  wR.  Likewise  the  control  uk  influences  tne 


cost  at  two  points,  inis  is  a redundancy  in  the  system  equation  model  given 
aDove  and  will  be  eliminated  in  Chapter  q wnen  tne  transition  Kernel  ana 
reduced  one-stage  cost  function  are  introduced. 

lhe  system  equation  model  is  more  common  in  engineering  literature  anu 
generally  more  convenient  in  applications,  so  we  are  taxing  it  as  our  starting 
point.  ine  transition  Kernel  and  reduced  one-stage  cost  function  are 

technical  devices  which  eliminate  the  cisturDance  space  (w,3l  from 
consideration  and  make  tne  model  more  suitable  for  analysis.  we  take  pains 
initially  to  point  out  how  properties  of  the  original  system  carry  over  into 
properties  of  the  transition  kernel  and  reduced  one-stage  cost  function  (see 
Definition  q.2  and  following  remarks,  iheorem  jo  and  following  remarxs).  in 
Chapter  b we  do  not  repeat  this  process  but  ratner  introduce  the  nonstationary 
and  imperfect  state  information  models  directly  in  terms  of  the  transition 
kernel  and  reduced  one-stage  cost  function,  leaving  the  reader  to  infer  the 
system  equation  models. 

lo  place  our  model  in  the  literature  on  stochastic  decision  tneory,  we 
review  some  terminology’.  In  our  model  the  distribution  of  state  is 

entirely  determined  by  tne  distribution  of  Such  a decision  process 

is  called  Markovian . lhe  cost  structure  is  additive . i.e.  the  total  co.Sc.  of 
the  system  operation  is  the  sum  of  all  the  one-stage  costs,  lhe  horizon  is 
tne  number  of  stages  for  whicn  the  system  operates  and  can  be  either  finite  or 
infinite.  We  will  allow  the  decision  maker  to  utilize  full  knowledge  of  the 
system  structure  (.tne  functions  f and  g and  the  disturbance  distribution  p) 
and,  when  he  is  choosing  control  u(C)  f,e  will  > enow  the  past  states  and 
controls  U0,u0,  . . . .u^.;  ,xK) . Ihis  is  tne  case  of  a nonanticipative 


information  structure;  tne  decision  maker  does  rot  <..'0W  the  next  state  until 


he  nas  chosen  the  current  control.  The  decision  maker  does,  however,  nave 
total  recall  and  oerlect  state  in  format  ion , i.e.  once  he  nas  oDservea  a state 
or  chosen  a control,  this  knowledge  is  not  lost  to  him,  and  he  ooserves  tne 
states  accurately,  kve  shall  show  that  despite  tnis  aoundance  of  inlormation, 
the  best  control  possible  can  be  achieved  by  taking  into  account  only  tne  most 
recent  and  perhaps  the  initial  state.  'Inis  model  is  stationary  (also  called 
homogeneous ) Decause  the  functions  f and  g and  tne  disturbance  distriDution  p 
are  independent  of  the  time  index  k.  i«e  nasten  to  add  that  tnese  last  two 
conditions  on  our  model  really  involve  no  loss  ol  generality.  Chapter  o is 
devoted  to  showing  tnat  botn  tne  imperfect  state  information  and  nonstationarv 
models  can  be  red  ’ to  the  one  considered  here. 

Stochastic  sequential  control  is  distinguisnea  from  its  deterministic 
counterpart  by  tne  concern  with  when  information  Decomes  availaole.  In 
deterministic  control,  a sequence  of  control  variables  (. uQ)  . . . , uN_-j ) can  be 
specified  before-hand  ana  tne  resulting  states  of  tne  system  are  determined  oy 
tl.lj.  In  contrast,  if  the  control  variables  are  specified  before-nand  for  a 
stochastic  system,  the  decision  maker  may  realize  in  the  course  of  the  system 
evolution  that  unexpected  states  have  appeared  and  the  specified  control 
variables  are  no  longer  appropriate.  Thus  we  are  led  to  consider  policies 
Tt  = f^0, , . . ,/*N_i ) , where^^  is  a function  from  history  to  control.  If  xQ  is  tne 
initial  state,  u0=/*b^x0)  is  taken  to  be  tne  first  control.  It  tne  states  ana 
controls  fx0,uQ,  . . . ,u(<_1  ,xk)  have  occurrea,  the  control 

Uk  =>tk^xo,uo»  * • * »uk-1  ,xk^ 
is  chosen,  we  require  that  tne  control  constraint 

/*K(x0,u0»  • • • >uk-i  *xk^  ^xk) 


5 

be  satisiied  for  every  (x0,u0,  . . . ,uK_-|  ,xk;  ana  k.  In  this  way  tne  aecision 
maker  utilizes  tne  lull  information  available  to  him  at  each  stage,  nather 
than  cnoosing  a sequence  of  control  variables,  the  decision  mawer  attempts  to 
cnoose  a policy  which  minimizes  tne  total  expected  cost  of  tne  system 
operation . 

ine  analysis  of  tne  stocnastic  decision  model  outlined  aDove  can  oe 
pretty  much  divided  into  two  categories,  structural  considerations  and 
measurability  considerations.  Structural  analysis  consists  of  all  tnose 
results  which  can  be  gotten  if  measurability  of  all  functions  and  sets  arising 
in  the  problem  is  of  no  real  concern;  for  example,  it  the  moael  is 
deterministic  or,  more  generally,  if  the  disturbance  space  w is  countable, 
t-jeasurability  analysis  consists  of  showing  that  the  structural  results  remain 
valid  even  when  one  is  forced  to  place  nontrivial  measurability  restrictions 
on  the  set  of  admissible  policies.  The  present  work.  i£  primarily  one  of 
measurability  analysis  reiving  heavily  on  existing  structural  results. 

One  can  best  illustrate  this  dichotomy  of  analysis  by  the  finite  norizon 
dynamic  programming  algorithm  considered  by  bellman  LbJ.  ihe  algoritnm  is 
cased  on  tne  intuitively  appealing  principle  tnat  if  a policy  =^0 , ...  ; ; 

is  optimal  for  tne  h-stage  model,  tnen  the  policies  JtK  = ^/Ajv_K,/li[V_K+i  , . . . ; ) 

must  be  optimal  for  the  k-stage  truncated  problem.  fut  another  way,  a policy 
can  oe  optimal  only  if  every  "tail"  of  the  policy  is  optimal. 

inis  observation  suggests  a computational  procedure.  Define  for  every 
state  x, 

U.Hf  J0(X)  = o, 

(to)  Jk+1(x^  = illf  u£li(.x)e<S(x,u,wJ  + JkLf  (x,u,w)  Jf,  k=D, . . . ,ft-1 , 


o 


wnere  the  expectation  is  with  respect  to  p(dw|x,u). 

It  is  reasonable  to  expect  tnat  J^ix)  is  the  optimal  cost  of  operating 
the  system  over  k stages  when  the  initial  state  is  x,  and  that  if  /*K(x; 
achieves  the  infimum  in  (1.5)  for  every  x and  k=0 , . . . ,N-1 , then 

) is  an  optimal  policy  for  every  initial  state  x.  If  tnere  are 
no  measurability  considerations,  this  is  indeed  the  case.  Notice  how  the 
right  side  of  equation  (1.5)  Dalances  the  immediate  cost  g(x,u,w)  against  tne 
optimal  "cost-to-go"  JR[f(x,u,w)j  in  choosing  a policy. 

Ihe  dynamic  programming  algorithm  of  (1.4)  and  (1.5)  is  the  simplest  and 
most  widely  used  structural  result  of  stochastic  decision  theory,  and  much  of 
our  effort  will  be  directed  toward  proving  its  validity  in  a measure  theoretic 
frameworK.  me  difficulty,  of  course,  lies  in  showing  the  expression  in 
braces  in  (1.5)  is  measurable.  Thus  we  must  establish  measurability 

properties  for  tne  functions  Related  to  this  is  the  need  to  oalance  the 

measurability  restrictions  on  policies  (necessary  so  the  expected  cost 
corresponding  to  a policy  can  be  defined)  against  a desire  to  admit  enough 
policies  to  consideration  so  as  to  be  able  to  find  one  wnich  selects  at  or 
near  tne  infimum  in  (1.5). 

Section  iL.  ine.  present  work  related  to  tn£  literature. 

Ihe  goal  of  this  thesis  is  io  establish  the  suitability  nf  a borel  space 
framework  with  universally  measurable  policies  iar  a.  general  theory  oL 
stochastic  decision  models,  (ve  show  that  almost  every  known  structural  result 
can  be  proved  in  this  framework.  In  particular,  the  existence  of  policies 
which  are  optimal  or  nearly  optimal  for  every  initial  state  is  shown. 
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A great  many  authors  have  dealt  with  measurability  in  stochastic  decision 
theory.  This  section  describes  three  approaches  taken  and  how  their  aims  and 
results  relate  to  our  own. 

1.  iiie.  General  ^odel 

If  tne  state,  control  and  disturbance  spaces  are  arbitrary  measure 
spaces,  very  little  can  be  done.  One  attempt  in  this  direction  is  the  work  of 
Striebel  C3oJ  involving  p-essential  infima.  Geared  toward  giving  meaning  to 
the  dynamic  programming  algorithm,  this  work  replaces  (1.5)  Dy 

(1.0)  Jk+-|(x)  = Pk-essential  infimum^Efg'Cx.^xJ.w)  + J^Lf  (x,yu.(x),w) J}, 

ksO,...,N-1,  where  the  p-essential  infimum  is  over  all  measurable  ^ from  state 
space  S to  control  space  C satisfying  any  constraints  which  may  nave  been 
imposed.  The  functions  are  measurable,  and  if  the  probability  measures 
po’'",pN-1  are  Pr°Perly  chosen  and  the  so-called  countable  ^.-lattice  property 
holds,  this  modified  dynamic  programming  algorithm  generates  tne  optimal 
cost-to-go  functions  and  can  be  used  to  obtain  policies  which  are  optimal  or 
nearly  optimal  for  pw_1-almost  all  initial  states.  The  selection  of  the 
proper  probability  measures  pQ, . . . ,pN_i , however,  is  at  least  as  difficult  as 
executing  the  dynamic  programming  algorithm  and  the  verification  of  the 
countable  £ -lattice  property  is  equivalent  to  proving  the  existence  of  an 
€-optimal  policy.  In  oontrast  to  Striebel's  work,  we  will  impose  a borel 
space  structure  on  the  model  whicn  enables  us  to  obtain  significantly  stronger 


results. 


0 


11.  The 


Models 


Considerable  attention  nas  been  directed  toward  models  in  wnicn  tne  state 
ana  control  spaces  are  Borel  spaces  or  ev.-n  Rn,  and  the  reduced  cost  function 

g(x,u)  r jg'(x,u,w)p(dw|x,u) 

nas  lower  sernicontinuity  and/or  convexity  properties.  A companion  assumption 
is  that  the  mapping 

x — »U(x) 

is  a measurable  closed-valued  multifunction  Lk'/J.  In  tne  latter  case  there 
exists  a Borel  measurable  selector  — »C  such  tnat  ^.(x)i(j(x.)  f or  every 

state  x (huratowski  and  hyll-Nardzewski  llbj).  This  is,  of  course,  necessary 
if  any  policy  is  to  exist  at  all. 

The  main  fact  regarding  models  of  tnis  type  is  that  if  g is  lower 
semicontinuous,  S and  C are  compact,  and  x— »U(x)  is  closea-valueu  and 
measurable,  then  the  functions  defined  by  (1.4)  and  (1.5)  are  lower 
semicontinuous,  the  infimura  in  (1.3)  is  achieved  for  every  x and  k,  and  there 
are  Borel  measurable  selectors  /*-0,  • • • such  that  /^(x)  achieves  this 

infimum.  The  policy  (^, . . . ,/*N_i ) is  optimal.  This  existence  of  an  optimal 
policy  is  often  an  additional  benefit  of  imposing  topological  conditions  to 
insure  that  the  problem  is  well-defined,  for  results  in  tnis  direction,  see 
Maitra  L21],  Schael  L3U-Il33J»  and  freedman  1 1 3 j - fart  of  this  thesis  will 
deal  with  assumptions  of  this  nature,  not  in  order  to  resolve  measure 
tneoretic  questions,  but  rather  to  give  easily  verifiable  conditions  on  the 
system  equation  model  whicn  guarantee  the  existence  of  an  optimal  policy,  in 
Chapter  5 we  will  show  that  these  conditions  also  guarantee  convergence  of  the 
dynamic  programming  algorithm  over  an  infinite  norizon  to  the  optimal  cost 
function  and  tc-t  this  algorithm  can  be  used  to  generate  an  optimal  stationary 


policy.  This  result  generalizes  the  work  of  Maitra  [21]  in  tnat  we  need  not 
assume  the  one-stage  cost  is  bounded  and  the  discount  factor  is  less  tnan  one. 


Of  the  above  mentioned  papers,  the  one  most  comparable  to  tne  part  of  our 
own  work  that  utilizes  seraicontinuity  assumptions  is  ischael  L32J,  whicn 

reaches  the  same  conclusions  we  do  under  conditions  in  some  ways  more  general 
and  in  other  ways  more  restrictive  than  our  own.  In  contrast  to  that  work, 
our  development  does  not  appeal  to  the  Kuratowski-ftyll-Nardzewski  selection 
theorem  [la],  and  so  we  can  consider  a more  general  control  constraint.  In 
particular,  the  model  with  a oontinuous  transition  kernel,  a positive  definite 
quadratic  cost  function  on  a finite  dimensional  euclidean  space,  ana  no 

control  constraint  fits  our  framework  but  not  that  of  [52J.  This  model 

provides  the  motivation  for  our  line  of  analysis  and  is  discussed  more 

specifically  in  the  next  section. 


Continuity  and  compactness  assumptions  are  integral  to  much  of  the  work 
that  has  been  done  in  stochastic  programming.  This  work  differs  from  our  own 
in  both  its  aims  and  its  framework.  first  of  all,  in  the  usual  stochastic 
programming  model,  the  decision  maker  cannot  influence  the  distribution  of 
future  states  [see  Olsen  [23]-[25],  flockafellar  and  kets  [2ti],[29j,  ana  tne 
references  contained  therein).  Secondly,  assumptions  of  convexity,  lower 
semicontinuity  or  both  are  made  on  the  cost  function,  the  model  is  designed 
for  the  kuratowski-rtyll-Nardzewski  selection  theorem,  and  the  analysis  is 
carried  out  in  a finite  dimensional  Euclidean  state  space.  All  of  this  is  for 
the  purpose  of  overcoming  measurability  problems.  Results  are  not  readily 
generalizable  beyond  Euclidean  spaces  (Rockafellar  [27]).  The  thrust  of  the 
work  is  toward  convex  programming  type  results,  i.e.  duality  and  kuhn-Tucker 
conditions  for  optimality,  and  so  very  specific  structure  of  control  and  even 
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state  constraints  is  assumed  and  powerful  results  are  obtained. 

Our  work,  by  admitting  less  structure,  can  be  done  in  Euclidean  spaces 
and  immediately  generalized  to  Borel  spaces,  all  of  which  are  Borel  isomorphic 
to  Borel  subsets  of  [0,1].  The  need  for  this  flexibility  will  become  apparent 
in  Chapter  6 when  we  take  conditional  probabilities  as  the  states  of  a model. 
The  less  structured  model  is,  of  course,  consistent  with  our  desire  to  present 
a general  framework  for  stochastic  decision  theory.  Our  control  constraint  is 
of  the  form 

u( x ) = [u:  (x,u)tn  = rx, 

where  P is  an  analytic  subset  of  SC  and  Px*0  for  every  x.  It  can  be  shown 
that  if  x— >U(x)  is  a closed-valued  measurable  multifunction,  then  r={(x,u): 
utU(x)}  is  analytic,  indeed  Borel.  (See  [27],  Theorem  IE  for  the  case  when 
the  control  space  is  Rn.  The  proof  found  there  can  be  generalized.)  Thus  we 
have  generalized  this  constraint  to  a case  where  U(x)  need  not  be  closed  for 
each  x. 

III.  The  Borel  Models 

The  Borel  space  framework  was  introduced  by  Blackwell  [5]  and  further 
refined  by  Strauch,  Dynkin,  Juskevic,  Hinderer  and  others  [35,  12,  1 6 , 21,  6, 
13].  The  state  and  control  spaces  S and  C were  assumed  to  be  Borel  spaces, 
and  the  functions  defining  the  model  were  assumed  to  be  Borel  measurable. 
Initial  efforts  were  directed  toward  proving  the  existence  of  "nice"  optimal 
or  nearly  optimal  policies  in  this  framework.  Policies  were  required  to  be 


t 


Borel  measurable. 
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Under  these  conditions  it  is  possiole  to  prove  tne  universal 
measurability  of  tne  optimal  cost  function  .-*nd  the  existence  for  every  € >0  and 
probability  measure  p on  J of  a p-€-optimal  policy  (Strauch  Ld5J,  Theorems  7.1 
and  b.1).  A p-t-optirual  policy  is  one  which  leads  to  a cost  wnich  differs 
from  the  optimal  cost  by  less  tnan  e.  for  p-almost  every  initial  state,  tven 
over  a finite  horizon  the  optimal  cost  function  need  not  be  borel  measurable, 
and  there  need  not  exist  an  everywhere  €. -optimal  policy  (blacxwell  15J , 
Example  c).  The  difficulty  arises  from  the  inability  to  choose  a Borel 

measurable  function  /uR:S— >C  which  nearly  achieves  the  infimum  in  (1.5J 
uniformly  in  x.  Tne  nonexistence  of  such  a function  interferes  witn  the 
construction  of  optimal  policies  via  the  dynamic  programming  algorithm  (1.4) 
and  (1.5),  since  one  must  first  determine  at  each  stage  tne  measure  p with 
respect  to  which  it  is  satisfactory  to  nearly  achieve  the  infimum  in  (1.5)  for 
p-almost  every  x.  This  is  essentially  tne  same  problem  encountered  with 
(1.6).  The  difficulties  in  constructing  nearly  optimal  policies  over  an 
infinite  horizon  are  more  acute.  Furthermore,  from  an  applications  point  of 
view  a p-€-optiraal  policy,  even  if  it  can  be  constructed,  is  a much  less 
appealing  object  tnan  an  everywhere  ^-optimal  policy,  since  in  many  situations 
the  distribution  p is  unknown  or  may  change  when  the  system  is  operated 
repetitively,  in  which  case  a new  p-c-optimal  policy  must  be  computed. 

The  main  qualitative  result  qJL  this  thesis  in  that  i£  the  class  ol 
admissible  policies  in  Hie.  Bgral  model  la  enlarge!  in  include  all  universally 
measurable  policies,  then  Hl£  existence  nl  everywhere  fc-optimal  policies  can 
!§.  assured  an!,  il  HlS.  infimum  in  Me  dynamic  programming  algorithm  (1,5)  la 
attained  Ian  every  & an!  k,  tnan  an  everywhere  optimal  policy  exists.  Thus 
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the  notion  of  p-optimality  can  be  dispensed  with.  The  only  other  work  of  a 
similar  nature  is  that  of  Blackwell,  Freedman  and  Grkin  Lbj  , who  extended  the 


class  of  admissible  policies  to  those  which  are  analytically  measuraDle. 

Under  our  assumption  of  a nonpositive  cost,  they  proved  for  any  €>0  the 

existence  of  an  everywhere  e-optimal  policy  which,  at  stage  k,  chooses  control 
uk  dependent  on  the  entire  history  (x0,u  , . . . ,uk_1 ,xk),  i.e.  has  the  form 

« = f /Uq ) or  « = (/*, ynj  ,...) » where  i3  oi  the  form  (1.3).  Vie  prove  in 

Corollary  3-2.2  that  wnen  universally  measurable  policies  are  allowed,  then 
under  the  same  assumption  of  a nonpositive  cost,  an  £ -optimal  semi-harkov 

policy  exists,  i.e.  has  the  form  /*ic^x0»xjc)  • Thus  the  intermediate  states 
and  controls  can  be  forgotten.  Vie  also  provide  an  example  to  the  effect  tnat 
without  further  assumptions,  the  dependence  on  xQ  is  necessary. 

we  would  like  to  point  out  that  wnile  this  thesis  uses  universally 

measurable  policies  to  prove  results  which  nold  everywhere,  one  can  obtain  the 
former  results  which  allow  only  Borel  measurable  policies  and  hold  p-almost 
everywhere  as  corollaries.  This  follows  from  the  following  observation,  whose 
proof  we  sketch  shortly. 

(1.7)  If  X and  Y are  Borel  spaces,  pQ,p^,...  is  a sequence  of 
probability  measures  on  X,  and  ft-  is  a universally  measurable  map 
from  X to  Y , then  there  is  a Borel  measurable  map  ft-'  from  X to  i 
such  that 


/*(x)  = ft-'  Cx  ; 


for  p^-almost  every  x,  k=  0,1,... 


As  an  example  of  how  this  observation  can  be  used  to  obtain  p-almost 
everywhere  existence  results  from  ours,  consider  Theorem  5.12.  It  states  in 
part  that  if  6 >0  and  the  discount  factor  o-  is  less  than  one,  then  an  e -optimal 
nonrandomized  stationary  policy  exists,  i.e.  a policy  7T  = (yu,^*, ... ) where  y- is  a 
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universally  measurable  mapping  from  3 to  C.  Given  pQ  on  S,  this  policy 
generates  a sequence  of  measures  p on  £>,  wnere  pk  is  tne  distribution 

of  the  k-th  state  wnen  the  initial  state  has  distribution  pQ  and  the  policy  ft 
is  used.  Let  be  Borel  measurable  and  equal  to  for  pK-almost  every 

x,  k=G,1,....  Let  tt  ' . Then  it  can  be  shown  that  for  pQ-almost 

every  initial  state,  the  cost  corresponding  to  it'  equals  the  cost 

corresponding  to  T(  , so  x1  is  a pQ-t-optimal  nonrandomized  stationary  Borel 
measuraDle  policy.  The  existence  of  sucn  a it'  is  a new  result.  This  type  of 

argument  can  be  applied  to  all  the  existence  results  of  Cnapters  3 and  b. 

We  now  sketch  a proof  of  (1.7).  Assume  first  that  1 is  a Borel  subset  of 
[0,1].  Tnen  for  rt[0,1],  r rational,  the  set 

U(r)  = {x:  /A(x)<r j 

is  universally  measurable.  For  every  k,  let  pk[U(r)]  be  tne  outer  measure  of 
U(r)  with  respect  to  k and  let  &Ki>kk2>---  be  a decreasing  sequence  of  sets 
containing  u(r)  such  that 

P^LU(r)]  = Pj<L  jQi  j J • 

OO  OO 

Let  ts(r)=  n D B,.-.  Then 
k=1  j=1  KJ 

Pk[U(r)]  = pkLb(r)j,  k=u,1,..., 

and  the  argument  of  Lemma  2.1  applies. 

If  Y is  an  arbitrary  Borel  space,  it  is  Borel  isomorphic  to  a borel 
subset  of  [0,1]  and  (1.7)  follows. 
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'inis  section  summarizes  the  remainder  of  tne  tnesis  in  a 
cnapter-oy-cnapter  fasnion.  wew  theorems  will  De  indicatea.  If  a result  is 
shown  to  hold  for  every  initial  state  in  our  model,  wnereas  it  was  previously 
known  to  hold  for  p-almost  every  initial  state  wnen  policies  were  required  to 
be  borel  measurable,  we  will  say  the  "everywhere  nature"  of  the  result  is  new. 
The  logical  order  of  the  presentation  is  Appendix  A,  Appendix  B,  Appendix  C, 
and  then  Chapters  2 through  b. 


Chapter  2 and  the  appendices,  independent  of  the  stochastic  decision 
models,  present  the  pertinent  mathematics.  ho  genuinely  new  results  are 
obtained  Dut  rather  slight  extensions  of  existing  results  to  cover  the  cases 
at  hand  are  proved.  Section  1 of  Chapter  2 collects  most  of  the  notation  and 
conventions  used.  Section  2 states  and  proves  some  elementary  facts  aoout 
universally  measurable  extended  real-valued  functions  and  universally 
measurable  stochastic  kernels.  In  Section  3 it  is  shown  that  inrimizing  a 
bivariate  lower  semianalytic  function  over  one  variable  results  in  a lower 
semianalytic  function.  Theorem  2.4  establishes  tnat  wnen  an  extended 
real-valued  lower  semianalytic  function  is  integrated  against  a Borel 
measurable  stochastic  kernel,  the  resulting  function  is  lower  semianalytic. 
These  two  facts  sill  be  used  to  guarantee  that  the  functions  generated  by  the 
dynamic  programming  algoritnm  are  lower  semianalytic.  Theorem  2.5  relates  to 
measurable  selection  of  extremals  and  generalizes  a result  of  brown  and  Purves 
L 7 J in  that  it  applies  to  minimization  of  lower  semianalytic  functions  rather 
than  borel  measurable  functions.  The  theory  of  the  remaining  cnapters  rests 
on  tnis  theorem.  It  guarantees  that  for£>0,  universally  measurable  functions 
^:S— »C  exist  sucn  that  if  the  infimum  in  (1.5)  is  achieved  at  x,  then  f^U) 
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achieves  it,  and  otherwise  /*k(x)  comes  within  £ of  achieving  it.  Section  4 
develops  the  analogous  facts  about  lower  semicontinuous  functions,  except  that 
under  conditions  of  lower  semicontinuity  and  compactness  tne  infimum  in  (1.5) 
is  always  achieved  and  the  selector  which  achieves  it  can  be  chosen  to  De 
Borel  measurable. 

Chapter  3 is  devoted  to  definition  and  analysis  of  the  finite  norizon 
stochastic  decision  model.  Section  1 defines  the  model.  Section  2 defines 
the  dynamic  programming  algorithm  in  terms  of  the  operator  X and  snows  that 

tne  algorithm  generates  the  optimal  cost  function  (lheorem  3-2).  we  then  use 

tne  algorithm  to  construct  e. -optimal  policies  in  Corollary  p.2.2  and  a 
sequence  of  policies  exhibiting  {£n)  dominated  convergence  to  optimality  in 
Corollary  3 -2. 3.  These  corollaries  and  example  3*1  show  that  the  strongest 
possible  structural  results  hold.  Ihe  everywhere  nature  of  the  (r+ ) result  is 
new  and  the  (t’~)  results  are  completely  new. 

As  remarked  before,  our  framework  is  designed  so  that  when  the  infimum  in 

tne  dynamic  programming  algorithm  is  attained  for  every  x and  k,  an  optimal 

policy  exists.  This  is  the  statement  of  lheorem  3-3,  which  is  new. 

Tneorem  3.4  gives  a set  of  easily  verifiable  conditions  which  guarantee 
the  existence  of  an  optimal  policy.  Note  that  if  S=Rn,  C=«m,  r=SC, 
g(x ,u)=x 'wx+u' ku,  where  Q is  a positive  semidefinite  matrix  and  R is  a 
positive  definite  matrix,  then  by  replacing  c oy  its  one  point 
compactif icat ion  and  taking  rJ={ (x,uf : u'u<j},  conditions  (3-1b)-(5.1d;  are 

satisfied.  This  justifies  our  earlier  remark  concerning  tne  motivation  behind 


these  assumptions. 
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Cnapter  4 shows  in  detail  now  the  stochastic  decision  model  can  De 
converted  to  a deterministic  one.  Althougn  the  concept  is  not  new  (see 

witsennausen  Lpbj),  to  the  author's  knowledge  this  has  not  oeen  done  in  this 
systematic  manner  before.  The  conversion  is  carried  tnrough  only  for  the 
infinite  horizon  model,  as  it  is  not  necessary  for  the  development  in  Cnapter 
a.  It  is  also  done  only  under  assumptions  (P),(h),  or  (D) , although  tne 
models  make  sense  under  conditions  of  summability  similar  to  tnose  of  Cnapter 
3.  The  models  (P)  and  (N)  are  fundamentally  different,  and  as  tne  analysis 
proceeds,  it  would  be  necessary  to  make  stronger  assumptions  tnan  summability 
in  order  to  more  clearly  differentiate  the  two  cases.  ihis  is  worked  out  in 
some  detail  in  L12]. 

The  conversion  of  the  stochastic  model  to  the  deterministic  one  plays  a 
central  role  in  the  analysis  of  the  infinite  horizon  problem  in  Chapter  5, 

where  structural  results  are  applied  to  tne  deterministic  model  and  then 

transferred  to  the  stochastic  model.  This  line  of  analysis  is  economical  and 
results  in  exceedingly  simple  proofs  of  some  otherwise  difficult  theorems. 

One  such  is  given  as  Theorem  7-1  of  iSt^aucn  [35J  and  here  as  Corollary  4.5.1, 
namely,  that  the  optimal  cost  function  across  an  infinite  norizon  is  lower 
semianalytic.  Another  such  result  is  the  validity  of  the  optimality  equation 
given  as  Theorem  6.2  in  Strauch  [35J  and  as  Theorem  5.1  here.  The  analysis 
also  shows  how  results  for  stochastic  models  with  measurability  restrictions 
on  the  set  of  admissible  policies  can  be  obtained  from  general  results  on 
abstract  dynamic  programming  models  based  on  monotone  mappings  such  as  those 


of  Denardo  [6]  and  Bertsekas  L4j . 
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Chapter  5 begins  with  the  well-known  optimality  equation  for  infinite 
horizon  control  and  derives  the  similar  functional  equation  lor  the  cost 
corresponding  to  a stationary  policy.  in  'ineorem  bo,  existing  structural 
properties  relating  the  optimal  cost  function  J*  to  the  1 operator  are  proved 
for  the  borel  model.  This  tneorem  is  new,  as  is  its  companion  Theorem  5.4. 
Theorem  5.4  enables  us  to  establish  the  necessary  and  sufficient  condition  for 
a stationary'  policy  to  be  optimal  under  (t)  and  (D)  found  in  theorem  5-5. 

Inis  condition  is  also  found  in  Theorem  5.3  of  ochael  13kJ.  This  condition 
can  be  applied  as  a test  of  optimality  only  if  a stationary  policy  is  already 
given,  but  we  extend  it  to  a necessary  and  sufficient  condition  for  an  optimal 
nonrandomiz ea  stationary  policy  to  exist  in  Corollary  5.5.1.  Tne  corollary  is 
new;  indeed  it  is  not  true  if  policies  are  restricted  to  be  borel  measurable. 
Theorem  5.o  gives  a new  Dut  less  satisfying  condition  for  a stationary  policy 
to  be  optimal  under  (wj . An  extension  sucn  as  was  done  in  corollary  5.5.1  is 
not  possible  under  (fo). 

Theorem  5.7  states  that  under  (N)  and  (D)  the  finite  horizon  optimal  cost 
functions  converge  to  tne  infinite  horizon  optimal  cost  function  as  tne 

horizon  tends  to  infinity,  and  in  Theorem  5.d  snarp  bounds  on  the  rate  of 
convergence  are  provided  under  (D) . This  convergence  does  not  always  occur 
under  (P)  (see  Strauch  L35J,  txample  b.1  or  bertsekas  L3J,  Cnapter  b,  Problem 
A).  Conditions  equivalent  to  this  convergence  are  given  in  Theorem  5.5- 

Theorems  5.7  - 5 .y  are  new  for  the  borel  model.  In  Theorem  5.10  it  is  shown 
that  the  compactness  of  certain  level  sets  in  the  control  space  implies  tne 
equivalent  conditions  of  Tneorem  5-9  and  the  existence  of  a nonrandomized 
stationary  optimal  policy.  As  corollaries  of  this  tneorem,  we  see  that  it 
U(x)  is  finite  for  each  x or  if  tne  continuity  and  compactness  conditions 

imposed  in  Chapter  3 to  guarantee  existence  of  an  optimal  policy  over  a finite 
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horizon  are  satisfiea,  tnen  the  conclusion  of  iheorem  3.10  nolds . Theorem 
3.11  inaicates  how,  under  tnese  conditions,  a nonranaomized  stationary  optimal 
policy  can  he  obtained  as  a limit  of  policies  optimal  over  a finite  horizon. 

Tneorem  3.1b  is  the  basic  existence  result  for  € -optimal  policies  under 


(B)  and  (b)  . 

lhe 

everywhere  nature 

01 

tne  existence 

of  an  e.  -optimal 

nonrandomized 

MarKov 

policy  is  new. 

‘ine 

remainder  of 

tne  tneorem  is 

completely  new.  Theorem  ‘3.13  deals  witn  case  IN).  This  theorem  is  new  both 
structurally  and  measure  tneoretically , tne  strongest  previous  result  in  tnis 
area  being  contained  in  Blackwell,  Freedman  and  Orxin  LoJ  . An  interesting 
result  in  this  area  is  the  following  due  to  Frid  L1AJ:  If  unaer  IN),  the 

optimal  value  function  J*  is  everywhere  finite,  0<A<1  and  p is  a proDability 
measure  on  S,  then  a nonrandomized  stationary  policy  h exists  satisfying  for 
p-almost  every  x 

J„ ( x ) < A J'(x). 

In  particular,  it  J*  is  bounded  and  £>0,  a nonrandomized  stationary 
p-e-optimal  policy  exists.  we  have  not  been  able  to  establish  an  everywnere 
e-optimal  version  of  this  result. 

Cnapter  6 snows  how  more  general  models  than  that  considered  tnus  far  can 
be  reduced  to  our  framework.  Section  1 sketches  tnis  reduction  lor  tne 
nonstationary  model.  The  nonstationary  optimality  equation  is  given  as  an 
example  of  tne  operation  of  this  reduction.  We  use  the  nonstationary  model  in 
Theorem  o.B  to  prove  that  for  fixed  p and  £>0,  a weakly  p-t-optimal 
nonrandomized  harkov  policy  always  exists.  (Our  "weak  p-e-optimality " is  tne 
"p-t-optimality"  considered  by  ninderer  LlbJ.) 
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Section  P accomplishes  a similar  reduction  lor  the  moael  of  imperfect 
state  information.  A statistic  sufficient  for  control  is  defined,  and  tne 
values  taken  oy  tnis  statistic  become  the  states  in  the  perfect  state 
information  model.  Correspondences  of  costs  of  policies  and  optimal  costs  are 
established  in  Theorems  b.3  and  b.A.  A discussion  of  tne  finite  horizon 
dynamic  programming  algorithm  for  the  imperfect  state  information  model 
illustrates  the  reduction.  Tne  remaining  theorems  show  tnat  tne  identity 
mappings  on  tne  information  vectors  constitute  a statistic  sufficient  for 
control,  as  do  the  mappings  of  tne  information  vectors  into  tne  conditional 
distributions  of  the  state. 


Our  definition  of  statistic  sufficient  for  control  is  that  given  by 
Striebel  L3bJ  specialized  to  our  framework,  and  our  reduction  of  the  im.>-rfect 
state  information  model  is  similar  to  hers.  Our  work  differs  from  Striebel's 
in  tnat  tne  existence  of  a statistic  sufficient  for  control  is  guaranteed  Dy 
our  assumptions  and,  once  tnis  existence  is  shown,  the  entire  tneory  of  the 
previous  chapters  can  be  brought  to  bear  on  tne  imperfect  state  information 
model. 

Appenaix  A presents,  mostly  witnout  proof,  the  properties  of  analytic 
sets  needed  for  tne  tnesis. 


Appendix  B develops  properties  of  tne  borel  space  PIX)  of  prooaoility 
measures  on  a Borel  space  X.  Theorem  B.3  characterizes  the  borel  <r-alge bra  in 
BIX)  independent  of  tne  topology  on  P(X).  Such  a theorem  for  compact  X is 
available  in  the  literature  [y,  Proposition  3-1J  and  has  been  used  tor 
nonoompact  X (Straucli  [3b],  Blackwell,  Freedman  and  Orkin  LbJ],  the  authors 
evidently  intending  an  extension  of  the  compact  result  by  using  Urysohn's 
Theorem  to  embed  X in  a compact  metric  space.  The  details  of  this  development 
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have  been  carried  through  by  Hinderer  [16,  Theorem  12.13],  while  we  give  an 
alternate  proof.  In  Theorem  B.8,  the  analyticity  of  (ptP(X):  P(A)>c}  when  X 
is  compact  and  A is  analytic  is  shown  to  hold  for  noncompact  X by  appealing  to 
Urysohn's  Theorem.  This  result  has  also  been  used  previously  (Blackwell, 
Freedman  and  Orkin  [6]). 

Appendix  C defines  and  establishes  the  existence  of  stochastic  kernels  in 
Borel  spaces.  The  extension  of  Theorem  C.1  in  Theorem  C.2  to  include  a 
measurable  dependence  on  a parameter  is  crucial  for  the  development  of  the 
filtering  algorithm  in  Section  2 of  Chapter  6.  Theorem  C.?  characterizes 
stochastic  kernels  as  measurable  maps  from  conditioning  variable  to 
probability  measure. 


A 
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Let  i be  a topological  space.  The  space  1 is  complete  if  tnere  is  a 
metric  d on  1 consistent  with  its  topology  such  that  (l,dj  is  a complete 
metric  space.  ihus  any  topological  space  homeomorphic  to  a complete  space  is 
itself  complete. 


If  1 is  a complete  sepaJ^Bfc  (topological)  space,  we  denote  Dy  & L trie 
smallest  cr-alge br a containing  the  open  sets  in  i.  The  sets  inffl^  are  called 
the  borel  subsets  of  i . We  will  of  ten  write  X to  indicate  a set  in  6^ . 


1 It  a is  a borel  subset  of  some  complete  separable  space  1,  we 


say  X is  a borel 


We  understand  x to  nave  tne  relative  topology. 


It  usually  suits  our  purposes  to  treat  Borel  spaces  witnout  specifying 
the  complete  space  in  which  they  are  embedded  or  the  metric  which  mattes  that 
space  complete.  By  definition  a borel  space  X is  a raetrizable  topological 
space,  but  no  particular  metric  is  specified.  The  class  of  borel  subsets  of  a 
borel  space  X (denoted  ®x)  is  tne  smallest  o"-alge bra  containing  the  relatively 
open  sets  in  X.  The  borel  subsets  of  X are  also  borel  spaces  in  the  sense  of 
Def  inition  2 .1 . 


If  X is  a Borel  space,  R(X)  is  the  set  of  probability  measures  on  U,«x) 
and  is  a Borel  space  in  its  own  right  (Appendix  b).  The  set  of  Doundeu, 
continuous,  real-valued  functions  on  X will  be  denoted  oy  C(a).  If  d is  a 
metric  on  X consistent  witn  its  topology,  then  b^fX)  is  tne  set  of  bounded, 
real-valued  functions  on  X which  are  uniformly  continuous  with  respect  to  d. 


ine  letter  u represents  tne  real  line,  'me  symDol  n*  represents  tne  real 
line  with  + »o  and  - oo  ad joined . Delining  open  neignoorhoods  ol  + oo  as  lc,+o°J 
and  of  *ooas  «*  becomes  a complete  separable  space  containing  n as  a 

borel  subset.  'Ihe  set  of  rational  numbers  is  denoted  by  (j,  and  W»  is  defined 
analogous  to  k* . 

we  follow  tne  usual  conventions  witn  regard  to  ordering  and  aritnmetic  in 
K*,  with  tne  exception  that  whenever  the  sums  +00-00  and  -o«too  occur,  we  set 
them  equal  to  +00.  If  a is  a set  and  f:X — »h*,  then  f+( x )=max(u , f (x ) j ana 
f ~(x )=max(0 ,-f (x )}  . if  (X,d)  is  a metric  space  and  xeX,  A C.X,  tnen 

d(x ,A )=inf yt Ad(x , y; . If  f and  p are  a measurable  extendea  real-valued 

function  and  a probability  measure  on  a space,  respectively,  we  aefine 

(2.1 ) {f  dp  = (f+dp  - ff~ dp. 

Under  our  conventions,  this  surn  is  always  defined,  ine  integral  may  not  oe 
linear,  but  it  always  holds  that 

(2.2)  { (f+g)dp  <.  [f  dp  + (g  dp. 

If  Jf  dp  and  J”  g dp  are  not  infinite  of  dif  ferent  sign,  then,  of  course, 

equality  holds  in  (2.2). 

If  BCX,  the  function  is  defined  to  De  identically  one  on  b ana  zero 
otherwise.  The  symmetric  difference  operator  A is  defined  oy  AaB  = 

(A-b)  U (b-A).  The  juxtaposition  of  two  or  more  sets  represents  tneir  cross 
product.  for  example,  Xi  is  the  cross  product  of  X and  1.  Assuming  X and  1 
are  borel  spaces  and  letting  be  the  product  cr-algebra,  by  L20j,  Chapter 

1,  Theorem  1.10,  is  equal  to  . If  a product  space  XT  is  given,  proj^ 

is  the  projection  mapping  onto  the  X-axis.  If  DCxl,  Dx  is  the  cross  section 


tyt*:  (x,y)tu). 


ine  countable  cross  product  of  the  set  of  positive  integers  is  denoted  by 
«.  we  understand  N to  nave  tne  product  topoiogy,  where  the  set  of  positive 
integers  has  the  discrete  topoiogy.  inus  defined,  w is  a borel  space.  The 
space  of  irrationals  in  (.>>,1)  with  the  usual  topology  is  denoted  by  h*  and  is 
ho.aeomorpnic  to  N [19,  Section  j(IX)J. 


f unctions 


In  tnis  section  we  list  for  reference  several  Known  properties  of 
universally  measurable  functions,  rroofs  are  given  for  the  convenience  of  tne 
reader.  for  a definition  of  analytic  sets  and  a short  exposition  of  tneir 

properties,  see  Appendix  A.  The  symbols  and  will  denote  tne  universal 

and  analytic  cr-algebras,  respectively,  of  a borel  space  X f Definitions  k.b  and 
A . b ) . 

Definition  Let  a and  X be  borel  spaces  and  f a function  mapping  Dt 

into  1.  If  ^ for  every  bt<S^,  tnen  f is  universally  measurable.  If 

DtflL^  and  for  every  b£3^,  then  f is  analytically  measurable. 

In  a sense,  the  class  of  universally  measurable  extended  real-valued 
functions  is  as  large  a class  as  we  dare  consider.  If  an  extended  real-valuea 
function  f is  not  in  this  class,  then  there  is  some  probability  measure  pcb(A) 
such  tnat  the  integral  of  t witn  respect  to  p cannot  be  defined  without  resort 
to  p outer  measure.  If  f is  in  this  class,  tnen  Jf  dp  is  defined  by  (^.1)  and 

tne  remarks  following  Definition  A.b  ana,  provided  we  take  care  with  tne 

addition  of  infinities,  all  the  classical  integration  theorems  are  at  our 
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Lemiaa  ^ . 1 Let  x De  a borel  space  and  t : x — »n*.  'ir»e  function  1 is 

universally  measurable  if  and  only  if  for  every  ptr(X)  there  is  a borel 
measurable  tp:X  — » h*  sucn  tnat  for  p-almost  every  x,  = f IxJ. 


froot : 

Suppose  f is  universally  measurable  and  let  ptrfXj  oe  given.  for  rtg*, 
let  u(r)  = lx:  t(x)  <_  rj.  Ihen  f(x)  = inf  trtw» : xtli(r)j.  Let  be 

sucn  that  plblr)&utr)J  = 0.  Define 


fpU)  = inf  tree* : xtBlrJj  = intrtg«  r**b(r/xj’ 

where  = if  ccbfr)  and  ^(.r)=+°°  otrierwise'  Then  fp:X  — ♦ K"  is  borel 

measurable,  and 

{x:  f ( x ) i f (x)}dU  LBfr)AU(r)j 
p riQ 

has  p-measure  zero. 

Conversely,  if  given  pttHX),  tnere  is  a Borel  measurable  f such  tnat 
f=fp  p-alraost  everywhere,  tnen 

pUx:  f(x)  i cUlx:  fp(x)  i ct)  =0 

for  every  cch*,  and  tne  universal  measurability  of  f follows.  CtD 

Lemma  B.1  can  be  used  to  give  an  equivalent  definition  of  ft  dp  when  f is 
a universally  measurable  extended  real-valued  function  on  a borel  space  X and 
ptPlX).  Letting  f be  as  above,  we  could  define 

Jf  dp  = f fpdp. 

Lemma  L.*:  Let  X and  \ be  borel  spaces  and  q(dy|x)  a universally  measurable 


stochastic  kernel  on  1 given  X (see  Appendix  CJ.  Then  given  p€b(X),  tnere  is 
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a borel  measurable  stochastic  Kernel  qp(dy|x)  such  tnat  lor  p-almost  every  x, 
q( ' |x)=qp( • |x). 

Broof : 

Since  1 is  separable  and  raetrizable,  tne  topology  in  1 can  be  generatea 
oy  a countaole  basis  of  open  neighborhoods  iA  = ,0-, , . . . j . therefore  M 

generates  <8^  . Let  be  the  class  ol  sets  in  U and  i.  eir  lirnte  intersections, 
e'or  r'£  > , let  f^  oe  a Borel  measurable  function  lor  which 

fp(x)  = q(B|x),  x£bp , 

where  bp  is  a borel  measurable  set  with  p-measure  one.  buch  an  Ip  ana  bp 

exist  oy  Lemma  2.1.  tor  xtfl  b , let  q ( ’ I x )=q  ( * I x ) . tor  x(/l  b , let 

r'O  p Ft> 

qp('|x)  oe  some  f ixed  probability  measure  in  2(1).  Ine  class  ol'  sets  l.  in  ®>  ^ 
lor  which  qp(Xix)  is  Borel  measurable  in  x is  a bynxin  system  containing  0 . 
Tne  class  > is  closed  under  finite  intersections  and  generates  . ihe  lemma 
follows  from  the  uynxin  system  tneorem  LI,  Theorem  4.1.2J.  Utb 

Theorem  2. 1 L.et,  a and  Y be  Borel  spaces  and  let  f:Al — ♦ ri*  oe  universally 
measurable.  Let  q(dylx)  be  a universally  measurable  stochastic  Kernel  on  1 
given  A.  Then  the  mapping 

x— > jf  (x,y)q(dy|x) 

is  universally  measurable  from  X to  H*. 

Broof : 

Given  p£B(X),  there  is  a borel  measurable  stochastic  Kernel  qp(dy|x)  sucn 
tnat  qp(  ’ I x ) = q(  ‘ | x ) for  p-almost  every  x.  Define  a measure  r on  AY  by 
specifying  it  on  measurable  rectangles  to  be  LI,  Theorem  2.o.«:j 


r(XI)  = i qn(l! x)p(dx) . 
JX  p 


26 


Let  fp  be  a Borel  measurable  function  such  that 

fp(x,y)  = f ( x , y ) 

for  r-almost  every  (x,y).  Then  for  every  E e &x , 

[ ( f (x,y)q  (dylx)p(dx)  = J \ f(x,y)q  (dy! x)p(dx) . 

■’E  Y p p E l p 

This  implies 

[ f (x,y)q  (dy|x)  = j f(x,y)q  (dy ! x)  = j f (x,y )q ( dy I x) 

Jy  P P y P Y 

for  p-almost  every  x.  The  left  hand  side  is  Borel  measurable  by  (2.1), 
Corollary  B.3.1  and  Theorem  C.3,  so  the  right  hand  side  is  universally 
measurable  by  Lemma  2.1.  QED 

Theorem  2.2  Let  X and  Y be  Borel  spaces,  D a universally  measurable  subspace 
of  X,  and  f:D-^Y  a universally  measurable  function.  If  UCY  is  universally 
measurable,  then  f_1(U)  is  universally  measurable. 

Proof: 

Let  m be  a finite  measure  on  X and  define  a measure  m'  on  Y by 

m'(B)  = m[f-1 (B) ] , B t By. 

Since  UtlLy,  there  exists  a Borel  set  BCY  such  that 

m[f_1  (U)  A f-1  ( B ) ] = m'tUAB]  = 0. 

The  set  f_1  (B)  is  in  "U x,  and  so  there  exists  a Borel  set  CCX  such  that 
m[C  A f_1 ( B) ] =0 . Then 


A 
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so  f-1(bj  is  in  tne  completion  of  <3^  witn  respect  to  m.  Lmb 

corollary  c .'<l.  1 Let  X,f  and  Z oe  borel  spaces  and  f:X— *!,  g:i — »Z 

universally  measurable.  Tnen  got  is  universally  measurable. 

corollary  2.a.c  Let  X,1  and  Z be  borel  spaces  and  t:X — »1,  g:l— »Z 

analytically  measurable.  ihen  got  is  universally  measurable. 

One  might  speculate  tnat  under  tne  hypotheses  ol  corollary  2.2.2,  the 
composition  g»f  is  analytically  measurable.  Ihis  is  apparently  an  open 

question.  if  it  could  oe  answered  in  the  affirmative,  then  the  selector  in 
ineorem  2.5  could  be  taxen  to  be  analytically  measurable  and  analytically 
measurable  policies  (Definition  q.q)  would  suffice  for  the  analysis  of 

Cnapters  q - 6. 

Lection  5 ■ Lower  semianalvtic  t unctions  and  the  Key  selection  theorem 

definition  2_,_i  Let  X oe  a Borel  space  and  f:X — >K*.  If  ix:  f'txKcJ  'is 

analytic  for  every  c£K,  f is  said  to  be  lower  semianalvtic. 

Note  that  as  a consequence  of  iheorem  A. 2,  f is  lower  semianalytic  if  and 

only  if  {x:  f(x)<c}  is  analytic  for  every  ctt-i*.  Also  (x:  f(x)<cj  is 

analytic  for  every  ctR*  if  and  only  if  (x:  f(x)<cj  is.  If  f and  g are  lower 

semianalytic,  then  for  cck,  t x : f(x)+g(x)<c}  = U lx:  f(x)<r,  g(x)<c-rj  is 

rtQ 

analytic,  so  f+g  is  lower  semianalytic.  Inis  is  true  even  n 
f (x )+g(x  ) = +oo  - aa,  which  by  convention  we  taXe  to  he  +<*= . As  shown  by  the  next 
theorem,  lower  semianalytic  functions  can  be  characterized  as  those  functions 


obtained  by  intimizing  bivariate  Borel  functions  in  one  of  their  variables. 


1 

28 

More  importantly,  if  a bivariate  lower  semianalytic  function  is  infimized  in 
one  of  its  variables,  the  result  is  again  lower  semianalytic . It  is  this 
closure  under  infimization  which  makes  the  dynamic  programming  algorithm  of 
the  subsequent  chapters  possible. 

Theorem  2., 3.  Let  X and  Y be  Borel  spaces  and  f:XY  — »R*  be  lower  semianalytic. 

Then  the  function  infytYf(x,y)  mapping  X into  R*  is  lower  semianalytic. 
Conversely,  any  lower  semianalytic  g:X  — >R*  is  of  the  form 

g(x)  = infztNf(x,z) , 

where  f:XN— »R*  is  Borel  measurable. 

Proof: 

For  the  first  part  of  the  theorem,  observe  that  for  ccR, 

{x:  inf  yf(x,y)  < c}  = projx((x,y):  f(x,y)  < c} 
is  analytic  by  Corollary  A . 3 . 1 • 

For  the  second  part  of  the  theorem,  let  g:X — > R*  be  lower  semianalytic. 

For  rtQ,  let  A(r)={x:  g(x)<r}.  Then  A(r)  is  analytic  and  by  Theorem  A. 7, 

A(r)  = projxF(r),  where  F(r)  is  a closed  set  in  XN.  Define 


G(r)  = U F(r’) 
r ' <r 


and 


f(x,z)  = inf  {r:  (x,z)tG(r)}  = infrtQ  r^G(r)^x’z^’ 
where  r ) (x,z)=  1 if  (x,z)tG(r)  and  (x,z)  = +oo  otherwise.  The  function 


f is  Borel  measurable. 


II  glxKc  for  some  ctrt,  then  there  exists  riw  lor  which  g(xj<r<c,  ana  so 
xtAlr).  there  exists  zin  such  tnat  (x,z) t G(r),  and  consequently 

ini 2tN i (.x  , z)<.r<c . This  shows  inlzt[gf(x,z)  cannot  be  greater  than  g(x). 


li  ini ztH i(.x , z) <c  lor  some  ecu,  tnen  there  exists  rtQ  lor  which 
intziiMUx  >z^<r<c  and  lx,z)tJ(r).  Thus  for  some  r'cw,  r'<r,  we  have 
lx  ,z)  t r'lr ' ) ana  xiAlr'j.  inis  implies  g(x)<r'<r<c,  which  shows  glx)  cannot 
be  greater  tnan  inlz  ^Kx.z).  WhU 

Ineorem  2.H  net  X and  Y be  borel  spaces,  f:XY  — h"  lower  semianaly  tic , ana 
q(dylx)  a borel  measurable  stochastic  kernel  on  Y given  X.  Then  the  function 

x—»  jf (x,y)q(dyjx) 

is  lower  semianalytic. 
rroof : 

Suppose  f X) . Let  l ntx , y )=min in , f (x , y ) j . Then  eacn  I is  lower 

semianalytic  and  ft  I. 

The  set 

£n  = Ux,y,b):  fn(x,y)  < b < n| 

oo 

= n u i(x,y,b):  f„(x,y)  < r , r 1 b + 1/k  < n + 1/kj 

k=1  rtg  n 

is  analytic  in  XYh  by  Theorem  A. 2.  Let  X be  Lebesgue  measure  on  n,  ptblXY) 
and  pA  the  product  measure  on  XYh.  by  bubini's  Theorem, 

(pA)un)  = IxJ/hndAdp  = lXitn-vx’y)J<1p 

fxifn(x,y)dp. 


n - 
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r'or  ctK, 


t pt  t-  (.  X^t ) : (l'(x,y)dp  < c)  = f\lptP(Xi):  ( tnlx,y)dp  i cj 

1 n=  1 Jn  n 


= n tptr'CXi):  (pA)(h  ) 2 n-cj. 

n=1  n 


Define  the  mappings 


cr:x— »q(  * |x), 

Tn:p-*pAn, 

where  An  is  A restricted  to  LU,nj.  by  Ineorem  C.3,  cr  is  borel  measuraoie. 
The  sa*>oi.ogs 

p -^p(A)An(nJ , At<3xi,  iif<8LU)nJ 

are  bcrei  measurable,  so  the  Tn  mappings  are  also  (Corollary  6.3.3) -1  The  set 

lx:  if  (x,ylq(dy|  x)  <_  c)  = C\  cr“  1 qt-i  ( XY[0,nJ ) : q(6  ) 2 n-c) 

J n= 1 n n 

is  analytic  hy  "■  •••  jms  A. 2,  A. 3 and  B.o. 


Suppose  f<0.  Let  f (x,y)=max{-n,f(x,y)}.  Then  each  f'n  is  lower 
semianalytic  and  fnl  f.  The  sets  &n  = {(x,y,D):  fn(x,y)  <.  d 0}  are 

analytic  and 


(pA)(bn)  = /xJh\dA  dp  = -J^fn(x,y)dp. 


for  ctfi, 

Iptp(XY):  ff\x,y)dp<cj  = UlptP(XY):  ( f (x,y)dp<cj 
J n = 1 'v'  n 


XI 


^ebesgue  measure  on  L0,nj  is  not  a probability  measure,  but  the  extension  of 
Corollary  6.3-3  to  finite  measures  is  immediate. 
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o«= 

= U iptt'txi J : lpA;u  )>-cj. 
n=  1 

Proceed  as  before. 

In  tne  general  case, 

Jf lx,y)qcdyix)  = jf +(x,y;q(dy!x;  - Jf-tx,yJqldyix;. 

I 

Ine  i unctions  f+  and  -t~  are  lower  semianalytic,  so  by  tne  above  arguments, 
eacn  of  tne  summands  on  tne  right  is  lower  semianalytic.  Ine  tneorem  follows 
from  the  remark  following  Definition  2.3*  OLD 

Corollary  2.4.1  Let  X be  a borel  space,  ana  let  f:X — * n*  be  lower 
semianalytic.  Then  the  function  p— »Jf  dp  is  lower  semianalytic  on  f(K). 

Proof: 

uefine  a stochastic  kernel  on  X given  PfX)  by  qf'|p)=p.  Apply  Iheorem 
2.4.  QLD 

We  state  and  prove  the  key  selection  theorem.  Inis  is  an  extension  of  a 
tneorem  oy  L.  D.  Brown  and  R.  Purves  L 7 , iheorem  2J , in  that  we  allow  i to  be 
lower  semianalytic  rather  tnan  Borel  measurable.  Our  proof  parallels  theirs. 

Iheorem  2.b  Let  X and  Y be  borel  spaces,  DCXi  an  analytic  set,  ana  t :D — 
lower  semianalytic.  Inen 

(a)  The  set 

I = ixtproj^D : for  some  yQtI  , fU,y0>  = inf'  Yf(x,y)} 

is  universally  measurable; 

(b)  for  each  OU,  tnere  is  a universally  measurable  selector  (f:  pro  — *1 

I 


satisfying 


fix,  fix))  = min  jilx.y)  it  xtl; 
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< t + iniytiflx,y)  if  xfl,  infytif lx,y)  i -o°; 

< -1/t  if  xfi,  inf  v/tVf  ( c,y)  = -oo. 

Broof : 

Assume  first  tnat  u-K'i . Ihe  set 

E = t(x,y,  b) : f lx ,y)  < d) 

o° 

- f\  U l(x,y,  b) : f (x  ,y)  <_  r , r <_  b + 1/*c  J 

k=1  rtCj' 

is  analytic  in  XiK*  by  Theorem  A. 2.  ihe  set 

A = projAMlllby 

is  analytic  in  Xft*  by  Corollary  A.j.1. 

By  von  Neumann's  Lemma  (Theorem  A.o),  tnere  is  an  analytically  measurable 
p- A — >1  such  that  (x ,p(x , b), b) £ E for  every  (x,b)t  A.  Def  ine 'j' :1  — * ! by 

Y lx)  = p( x, infycyf (x »y) ) . 

r’or  xcl,  (x,inf  jf'(x,y))i  A,  so  ^ is  defined.  The  mapping 

T :x— » (x,infyt^f(x,y;) 

is  analytically  measurable  (Theorem  2.3),  and  so  ^ is  universally  measurable 
(Corollary  2.2.2),  provided  1 is  universally  measurable,  fay  Theorem  2.2, 

1 = lx:  lx,infytif(x,y))£A}  = T-1(A) 

is  universally  measurable. 


For  OO,  define  tne  analytic  sets 


= U x,y , o ) : f(x,y)  < b + ti, 

A€  = projXH.E£. 

Let  :A£-*i  be  analytically  measurable  and  satisfy  fx  ,ptfx , b) , b ) t Et  lor 
every  (x,b )t  A^.  Define  g:X— » H*  by 


gU)  = infyaf(x,y)  if  infytlf(x,y)  i -c*; 
= -(1/^.  + e)  otherwise. 


Then  g is  analytically  measurable,  so 


V€(x)  = p€  (x,gU)) 


is  universally  measurable. 


Tne  function  <J>  defined  Dy 


<p(x)  = t(x)  if  xtl; 


Yt(x)  if  x£L; 


is  universally  measurable.  For  xtl, 


fU,f(x))  = f l x , W.  x ) ) = f (x,pU,infytifTx,y)))  i intytlflx,y), 


For  x$l, 


f(x,$(x))  = f(x,  tt(x))  = f(x,p€(x,g(x) )) 

<.  € + g(x)  <€  + infytif(x,y)  if  infytYf(x,y)  i -oo; 
1 -1/C  if  infvtVf(x,y)  = 


-CO  . 


b4 

Now  if  u is  a proper  subset  ox  AY , extend  1 to  AY  by  setting  it  equal  to 
+ oo  outsiae  D.  Let  oe  tne  selector  given  by  the  above  argument  applied  to 
tnis  extended  I . Let  (Jl, : pro — »Y  oe  an  analytically  measurable  lunction 
such  tnat  (x,^lx))l  D for  every  xiprojxd.  Set  <$>  equal  to  on  fxtprojxu: 
inf  ^'(x^X+oo)  and  equal  to  ^ on  txiprojxD:  infyt^f tx, y)=+oo} . xhen  <p 

nas  the  required  properties.  wmD. 

oection  4.  Lower  semicontinuous  f unctions  and  selection  under  compactness 
wtions 

In  tnis  section  some  Known  results  on  semicontinuous  functions  are 
listed,  most  of  tne  proofs  are  straignt-f orward  or  easily  referenced  and  thus 
omitted. 

uei inition  d.H  Let  A be  a locally  compact  metric  space,  h subset  A of  A is 

O© 

cr-comoact  if  K.=  U where  each  K. ^ is  compact. 
j=i  3 J 

Lemma  Let  A be  a borel  space,  Y a compact  metric  space  and  Aj,  j = 1 

oo 

a sequence  of  closed  subsets  of  AY.  Let  A=  U Then  projxA  is  a borel  set 

j = 1 J 

ana  there  is  a borel  measurable  map  <P:projxK — »Y  sucn  that  (.x,  <p(xj)  t a for 
all  xtprojxA. 


Proof: 


for  each  xtA,  is  closed  in  Y and  consequently  compact.  Therefore 


db 


ILerna  2 . 5 Let  A and  \ oe  metric  spaces. 

(a)  Let  and  f2  mapping  X into  R*  be  lower  semicontinuous  and  eitner  Dotn 
bounded  above  or  both  bounded  below.  Then  f'i  + f^,  is  lower  semicontinuous. 
(b)  Let  {f  } be  a sequence  of  lower  semicontinuous  functions  mapping  X into 

K \ 

k*.  Then  sup^f^  is  lower  semicontinuous. 

Theorem  2.b  Let  X be  a metric  space,  Y a compact  metric  space  and  f:XY— >n* 
lower  semicontinuous.  Then  the  function 

g(x)  r minyeif(x,y) 

is  lower  semicontinuous. 

Theorem  2.b  establishes  that  on  oorapact  spaces,  lower  semicontinuous 
functions  are  closed  under  infimization  in  the  same  way  that  lower 
semianalytic  functions  are  on  borel  spaces  (Theorem  2.3).  There  is  also  a 
selection  theorem  for  lower  semicontinuous  functions  (cf.  Theorems  2.5  and 


■ ^ - Jk 
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2.7).  Similar  to  Theorem  2.4  for  lower  semianalytic  functions  and  Borel 

measurable  stochastic  kernels,  we  will  establish  Theorem  2.8  for  lower 

semicontinuous  functions  and  continuous  stochastic  kernels. 

Theorem  2 . 7 Let  X be  a Borel  space,  Y a compact  metric  space  and  f:XY— »R* 
lower  semicontinuous.  Then  there  is  a Borel  measurable  map<p:X— *Y  such  that 

f(x,q>(x))  = miny£Yf(x,y) 

for  all  xtX. 

Proof: 

If  f is  real-valued,  this  theorem  is  analogous  to  a result  given  by 

Dubins  and  Savage  [10,  Chapter  2.16]  and  repeated  by  Maitra  [21]  for  upper 

semicontinuous  functions.  The  extension  to  extended  real-valued  functions  is 
immediate.  QED 

The  remainder  of  the  chapter  borrows  from  Schael  [33]- 

Lemma  2 . 6 Let  X and  Y be  Borel  spaces.  For  pEP(X),  qtP(Y),  let  pa  be  the 
product  measure  on  XY.  The  mapping  (p,q)— >pq  is  continuous. 

Proof: 

By  Urysohn's  Theorem  [11,  Chapter  IX,  Corollary  9.2],  X and  Y can  be 

- 

homeomorphically  embedded  in  compact  metric  spaces  X and  Y.  For  simplicity 
of  notation,  we  treat  X as  a subset  of  X and  Y as  a subset  of  Y.  By  Theorem 
B.4,  X and  Y are  Borel  subsets  of  X and  Y respectively. 

Let  d be  a metric  on  XY  consistent  with  its  topology  and  ftU^CXY). 
Extend  f to  ftC(XY)  by  Lemma  B.1  and  the  Tietze  extension  theorem  [11].  By 
the  Stone-Weierstrass  Theorem  [34,  Section  36,  Theorem  A],  the  set  of  finite 
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linear  combinations  of  tne  form 


Sl^gjUJnjty),  gjtC(X),  njtUl), 

is  dense  in  CfXY)  witn  respect  to  the  supremum  norm.  Tnus  tor  given  €.>0,  it 

K 

is  possible  to  find  sucn  a linear  combination  £ gdxitidy)  whicn 

j=1  J J 

approximates  f uniformly  to  witnin  € . 'Ihe  restrictions  ana  hj  of  gj  ana 
hj  to  X and  Y are  in  CiX)  ana  CQ)  respectively.  by  Theorem  b.1,  it  pn— »p 
in  Pix)  and  qn-J»q  in  PtYJ, 

lim  supn!j'^f  d(pnqn)  - J^f  d(pq)|  < lim  supj^lf  - j i <HPnqn7 

+ ! limn 1 Xxs jdpn jdqn  " (xs jdp (Ynjdc> 1 + " liatpq)  1 

Tne  continuity  of  (p,q)— *pq  follows  from  Theorem  b.1.  Ctb 


Definition  2.b  Let  X ana  Y oe  borel  spaces  and  q(ay|xj  a stochastic  Kernel 
on  T given  X.  If  the  mapping  x— »q(dy|x)  is  continuous  from  X to  r(Y),  tne 
stochastic  kernel  q is  said  to  be  continuous. 

Lemma  2 . 7 Let  X and  Y be  borel  spaces,  ftC(XY),  ana  q(ayix)  a continuous 
stochastic  kernel  on  Y given  a.  Then  the  function 

x — »{f  U,y)q(dy|x) 

is  in  C(X) . 

Proof: 

By  Lemma  2.6  and  Corollary  b.7.1,  the  mapping 


>Pxq(ay!x) 


is  continuous,  where  px  is  the  probability  measure  assigning  unit  mass  to  the 


is  bounded  below  and  lower  semicontinuous. 
Proof : 

use  Lemma  2.4,  2.b(b)  and  2.7.  Qhb 


ChAPTER  3 


Tht  FINITE  hOhlZUN  MODEL 
Section  _L_  1M  stochastic  Sio<jel 

Definition  H.  1 A stochastic  decision  model  ( Sm)  is  tne  nine-tuple 
(S,C,r,w ,p, f .rf.g'.N)  described  below.  The  letters  x and  u are  used  to  denote 
elements  of  S and  C respectively, 
b:  btate  space . A nonempty  Borel  space. 

C:  Control  space.  A nonempty  Borel  space. 

T:  Constraint  set.  An  analytic  subset  of  SC.  For  every  xtS,  Tx  is  nonempty, 
w:  disturbance  space.  A nonempty  Borel  space. 

pfdw|x,u):  Disturbance  Kernel . A Borel  measurable  stochastic  Kernel  on  w 

given  bC. 

f : bvstem  function . Maps  SCw  into  S and  is  measurable  with  respect  to  tne 

product  Borel  c-algebra. 

<*:  Discount  factor  . 0<  c*  <_1 . 

g':  One-stage  cost  function.  Maps  SCIn  into  R*  and  is  lower  semianalytic. 

N : horizon . A positive  integer. 

The  system  moves  from  state  xk  to  state  xk+1  via  the  system  equation 
xk+1  = ^ ^ xk  > uk  ’ w k ^ 1 » • • • » ^ » 

and  incurs  cost  at  each  stage  of  g'(xk,uk,wk).  The  disturbances  wR  are  random 
objects  with  probability  distributions  p(dwk I x^ , uR) . The  goal  is  to  choose  uR 
dependent  on  the  current  state  xk  so  as  to  minimize 
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.we  make  this  discussion  precis'-  v-  it.  uefinit ions  o.2  - 3-t>. 

Definition  i.2  uiven  (om)  of  Definition  3.1,  tne  state  transition  Kernel  is 
tne  borel  measurable  stochastic  Kernel  on  3 given  3C  defined  by 

(o-D  t (A ! x ,u ) = p(tw: f (x,u,w)iaj |x,u)  = p(f -1 ( a) (x  ^ i x, uf . 

Thus  defined,  t(Aix,u)  is  the  probability  that  xK+i =t ^XK, uK,wK)  is  in  A 
given  tnat  (x^,  ukJ  = (,x,uj . tor  fixed  (x,u),  t(’ix,u)  is  clearly  a prooability 
measure  on  3.  To  show  tnat  t is  borel  measurable,  we  show  that  p f fc> ^ x u^|x,u) 
is  measurable  for  each  Borel  suoset  b of  SCw.  lhe  sets  b for  which 
ptb^x  u)lx,u)  is  measurable  form  a Dynkin  system,  so  oy  the  Dynkin  system 
theorem  LI,  Theorem  4.1.2J,  we  neea  only  verify  that  pi ( 3Cw ) , ,,vix,u)  is 
measurable  for  S.i6  ^ But 

p((££k)  (x,u)  'x,u)  = PCiaLl X ,u)  if  (x,u)t 
= 0 otherwise. 

using  p it  is  possible  to  "integrate  out"  tne  w in  g'  so  that  tne 
disturbance  space  w disappears  entirely  from  the  model  description.  To  ao 
this,  define  the  (reduced)  one-stage  cost  function 

(3.2)  g(x,u)  = f g'(x,u,w)p(dwj x,u) . 

Then  g is  a lower  semianalytic  function  on  SC  (Theorem  2.4). 

Definition  3.3  A policy  in  (Sh)  is  a sequence  it  = 1^,^  ,...  ,^_1)  such  that 

for  each  k,  K(duKlx0,u0 uk-1,xk^  is  a universally  measurable 

kernel  on  C given  3C...CS  satisfying 

/VV^o Uk-Vxk>  = 1 


stochastic 
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tor  every  Cx0,u0,  . . . ,uk_1  ,xK) . If  tor  each  k,  is  parameterized  only  by 
U0,x  ),  is  a semi -Harkov  policy . If  is  parameterized  only  oy  xR,  Jt  is  a 
SiacKav  pal  lay.-  It  tor  each  k and  (x0,  u0>  . . . ,uK_1 , xk) , /*k(  * |xQ  ,u0.  • • • »uK_i  ) 
assigns  iiass  one  to  some  point  in  C,  n is  nonranaomized . In  this  case,  oy  a 
slight  abuse  ot  notation,  it  can  be  considered  to  be  a sequence  of  universally 
measurable  mappings  ^ :o>-H >C  sucn  tnat 

<WKU0’U0>"'’UK-1’Xk})£  r 

tor  every  fxQ , uQ, . . . , u i , xK ) . A policy  is  said  to  be  borei  measurapie  if  all 
its  stochastic  Kernel  components  are. 

Ne  denote  by  IT'  tne  set  of  all  policies  in  (oh)  and  by  TT  tne  set  ot  all 
Markov  policies.  we  will  show  that  in  many  cases  it  is  not  necessary  to  go 
outside  TT  to  find  tne  "best"  available  policy.  In  most  cases,  this  "best" 
policy  can  be  taken  to  be  nonrandomizeo . by  von  Neumann's  Lemma  tlneorem 
A.o),  tnere  exists  at  least  one  nonrandomizeo  Markov  policy,  so  TT  and  TT'  are 
nonempty . 


for  p tf(3)  and  ti  =(/<q,  . . . ,/»N_i)t  TT  ’ , we  say  tne  measures  qKtp(SC), 
k=0,1  , . . . ,h-1 , are  generated  from  pQ  fry  ft  if  for  every  ikc<8  c, 


(3o)  ..•••L  *Wxo'uo uk-1>xk) 


b L 


t(dXk|XK-1’Uk-1)l  ••/*OUU01X0)P0UXO)' 


If  (q0, . . . ,qN_. ) is  generated  from  px  by  TT , we  say  that  it  is  generated  from  x 

fix  a. 


uet  inition  ,i.h  If  xtS,  t<cTT',  K=1,...,N,  and  (qQ, . . . ,qH_1 ) is  generated  by 
t rom  x , then  th  e K-stage  cost  t unction  corresponding  to  TT  at  x is 


n 


td.1*) 


JK  -U)  = t <*Kjg  uQi,- 

K k=u  K 

Ihe  cost  t unction  corresponding  to  Tt  is 

1 1 1 inition  'i.b  Given  xtG,  K<n , tne  n-stage  optimai  cost  1 unction  at  x is 

j"(x)  = inf  ,Jk  _U1. 

w 

1 tie  optimal  cost  f unction  is  J ^ . 

dote  tnat  J*  is  independent  of  tne  horizon  n as  long  as  K<m. 

definition  d.  o If  € >U , tne  policy  ti  is  K-stage  fe-optimal  at  x providea 

JK  )1T(x)  <.  J,*(x)  +e  if'  J*(x)>-ao; 

<L  -i/e  if  j*Cx)=-oo. 

if  Jt(x)  = J^(x),  tnen  n is  K-stage  optimal  at.  x.  If  tt  is  K-stage  e-optimai 
or  K-stage  optimal  at  every  x,  it  is  K-stage  ^.-optimal  or  K-stage  optimal 
respectively.  If  {€flJ  is  a sequence  of  positive  numbers  with  £n4u,  a 
sequence  of  policies  {itnj  exhibits  dominated  convergence  to  optimality 

provided 

JK  ,TTn  = JK» 

and  for  n=2  ,3 , . . . 

JK,nnU)  1 JKU)  + €n  if 

JK,*nU)  — ^k  iT(n_  i ^ X ^ + if  lVx)  = -°°- 

If  k=N,  we  suppress  the  qualifier  "K-stage"  in  the  above  terms. 


**3 

it  cne  model  (3m)  is  sued  that  for  all  xto  ana  ntlT,  tde  measures 

U0»  • • • >qN_1 ) generated  from  x Dy  it  satisfy  fg+  dqR  < °°  , k=u «-1,  we  say 

(S-0  is  summable  apove  and  use  tne  sympol  (F“)  to  show  tnat  a result  holds 
unuer  this  assumption.  If  jg“  dqR  <o°  lor  every  qR  whicn  is  an  element  of  a 
sequence  ol  measures  generated  from  some  xCS  Dy  some  ncTf’,  we  say  (3.-i)  is 
summable  below  and  use  the  symbol  (F+)  to  show  that  a result  holds  under  this 
assumption.  under  (F+),  ^ is  a mapping  from  3 to  (-oo,  + oo j tor  eacn 

policy  n and  each  K,  while  under  (F-),  J*  n maps  S into  L -oo,+  »). 

It  will  often  be  convenient  to  subscript  the  state  and  control  spaces  as 
is  done  in  the  next  theorem.  Except  for  Chapter  6,  .Section  1,  SR  will  always 
oe  a copy  of  3 and  Ck  will  always  De  a copy  of  C. 

'theorem  5 . 1 If  xtS  ana  ft' eTT',  then  there  is  a borel  measurable  marxov 
policy  ft  such  tnat  JK  niix)  = J^  ^(x),  K=1,...,h. 

Proof: 

Let  = . . . *^_i ) • Given  xcS,  let  (q^ , . . . , q(^_1 ) pe  tne  sequence 

of  measures  generated  by  ft'  from  x.  Let  y^^idu  lx.,)  oe  the  borel  measurable 

'K.  K.  K 

. stochastic  kernel  obtained  by  decomposing  q^  (Theorem  C.1),  i.e.  for  every 

-KC®S’  ^c£®C’ 

qK^k%^  = Js  ^k^k'xk^qK^dxk  Ck^»  k=0,...,h-1. 

Let  ti  =(f+0  < • • • (/‘M-i ) and  let  (qQ»  — ) de  the  sequence  of  measures 

generated  by  n from  x.  It  suffices  to  show  qk=q^,  k=0 w-1.  We  proceed 

by  induction. 
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ror  k=0,  OqCQS^.C ^tCi^ 


iiiv0)  = --  qoc^o}- 

if  qK=q^,  then  f or  Zi^t®  s,  Cj  Q>  t , 

qK+1  (-R+1-^K+1  J = I /*k+1  1 XK+1  JqK+1  ^dXK+1  ’CK+1j 

— k+1 

= L „ L AW^k+ilxk+i)t(dWxk’ Vqk(aUk’ V> 

\LK%1 
= qk+1!-^k+1i‘k+1)’ 

wnere  the  last  equality  follows  from  the  induction  hypothesis.  Qbi) 


Coroilarv  3.1.1  for  every  KXfo  and  xtfa,  inf  ^ )ft(.x  )=infnt  ^ ^nl.x ) . 


2.  The 


programming  operators . Existence  of 


Let  uCC|S)  denote  the  set  of  universally  measurable  stochastic  kernels  /u. 
on  C given  S wnicn  satisfy  (.  Px  i x ) = 1 for  every  xcS.  Thus 
TT  = of  C IS) U(.C  IS) . . .U(C lb) , where  tnere  are  w factors. 


3.7  Let  J:L— » f*  be  universally  measurable  and  /*-EU(,  C ! S) . 


1.  mapping  d into  is  def  ined  by 


(T^JKx)  : j Lg(x,u)  + «*  J Jfx  ' )t (dx'  ix,u)  J^dul x) 
C fa 


for  every  x£S. 


fay  Theorem  2.1,  T^.J  is  universally  measurable.  We  show  in  Lemma  3.2  that 

under  (f+)  or  IF"),  the  cost  corresponding  to  a policy  n =({*Q , ...  1 ) can  be 

defined  in  terms  of  the  operators  1L  , ...,li* 

ro  rh-1 


3 . 8 Let  J:S — >R*  be  lower  semianalytic.  The  operator  X mapping  J 


into  TJ:S — >R*  is  defined  by 

( TJ ) ( x ) = inf  n {g(x,u)  +<*  j'j(x,)t(dx' ix,u)} 
x 

for  every  xcS. 

By  Theorems  2.3  and  2.H,  TJ  is  lower  semianalytic.  We  show  in  Theorem 
3.2  that  under  (F+)  or  (F~)  the  optimal  cost  can  be  defined  in  terms  of  the 
operator  T. 

Lemma  3 ■ 1 Let  J:S— > R*  be  lower  semianalytic.  Then  for  £>0,  there  exists 
jacU(CIS)  such  that 


(TrJ)(x)  < (TJ) (x)  + t if  ( TJ ) ( x )>—  oo ; 
= - oo  if  (TJ)(x)  = -oo. 


Proof : 

By  Theorem  2.5,  there  are  universally  measurable  selectors  /*m:S — *C  such 
that  for  m=1,2,...  and  xtS,  //m(x)t  I"1  and 

(T„  J) (x)  < (TJ ) ( x ) + 6 if  (TJ)(x)>-oo; 

“m 

< -2m  if  (TJ)(x)r-oo. 

Let^-(‘!x)  assign  mass  one  to  ^ ( x ) if  (TJ)(x)>-oo  and  assign  mass  1/2m  to 
^>m(x),  m=1,2,...,  if  (TJ)(x)=-oo.  Then  ^ has  the  desired  properties.  QED 

Lemma  3.2  Let  tt  =(^0, . . . ) be  ^ > K=1,...,N,  and  JQ  identically  zero. 

Then 

(3.5)  <V0"->K.,)J0- 

Under  (F+)  or  (F“),  equality  holds  in  (3.5). 


! 
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Proof: 


Let  Tr  = (^0 , . . . i ) be  in  IT  , xtS  and  (qQ, . . . ) be  generated  from  x by 

K Then  for  K=1 , . . . ,N, 


K-1  i,  , 

: I 01  J « dqk 
k=0  JSkCk  k 

K-2  k( 

= 1“  ( g dqk 

k=0  Vk  k 


+ L p Vc  (T>V  n Jo^  ^xK-1  ^ ( dxK-1  ' xK-2  ,uK-2  ^K-2^  d^xK-2  ,uK-2 ^ ^ 

O iL^rr  1 * Pv  “ I 


K-2  K-2  K-1 

K-3  kf  r 

= 1*1  g dqk  + f 

k=0  Vk  k >■ 


t,  r °*K  2L  L sCxK-2’uK-2^ 
*K-3LK-3  5K-2  lK-2 


/*K-2(duK-2 1 xK-2)t(dxK-2 ' xK-2 ’uK-3 )qK-3 (d(xK-3 ,uK-3 ^ ^ 

+ L r *K  2L  [r  «L  (TK,  /o^^K-l  )t(dxK-1  |xK-2’uK-2) 
5K-3LK-3  6K-2  lK-2  bK-1  K_1 

/*K-2 ( duK-2 ' xK-2 ) fc ( dxK-2 ' xK-3 ’ UK-3  ) qK-3 ( d ( XK-3 ’ uK-3  ^ 

lr  ( ( J(_ 2 


%r?kL  r S dqk  + fs  c ^K’2js  (Vk-2T^K-1)(J°)(Xk-2) 

5K-3lK-3  5K-2  K ^ K 1 


k=o  Jskck 


t ( dx^__2  ! ^K— 3 ’ ^K— 3 ) ^K—3  ( d ^ » u^_3 ) ) by  (2.2). 

Repeating  this  procedure  finitely  many  times,  we  eventually  obtain 

JK,*(x>  i (T/>0---Vk.1)(Jo)(x). 

If  (SM)  is  summable  below  or  summable  above,  the  above  inequalities  are 
equalities.  QED 

Lemma  3 . 3 ( F+ ) 

If  JQ  is  identically  zero,  then  (TkJQ) (x)>- oo  for  every  xcS,  k=1,...,N. 


Proof: 


Suppose  for  some  K<N  and  xtS  that 


M 


('l^J  )U)>-eo,  j=U,...,K-1, 


tor  every  xtS,  and 


rrjQ)ix;  = -oo. 


By  Theorem  a. 5,  tnere  are  universally  measuraole  selectors  — *C, 

j = such  that  /tjlx;£Pxand 

VK  < (IjJ0)(x)  + 1,  j=1,...,K-1, 

for  every  x£b.  Then 


•V  KJ0)  ^ cv  . .-v  ) c t j + i) 

~K-1  0 ~o  /K-2  0 


< ViM  ...ta  MlSJ.  + 1 +«*; 

'o  k-5 


< 1K"1J0  + M, 


where  tne  last  inequality  is  obtained  by  repeating  tne  process  used  to  obtain 
the  first  two  inequalities.  by  Lemma  3.1,  there  is  a stochastic  Kernel 
^>0tulC|S)  such  that 

'L  lTK-1Jn)lx)  = -oo. 


(V0I/V”W-1)(J°)(*)  - V0(tK  1jo  + 


r"or  /rtUlCiS),  let  tc  = (^.q /*K_  n ,/*,.•  . ,/*)cTT  and  let  CqQ> . . . »qN-1 ) be 

generated  from  x by  tt  . by  Lemma  3.2, 


‘l-Mg  dq  i = J K(x)  = (T*  ...V  )(JQ)lxl 
J =0  J J ro  rK-1  0 
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and  so  for  some  j,  jg“  dqj=oo.  This  contradicts  the  fact  that  (SM)  is 
summable  below.  QED 

Lemma  3,4  Let  {Jk}  be  a sequence  of  extended  real-valued  universally 
measurable  functions  on  S and  let  be  an  element  of  U(CIS). 

(a)  If  ( TjjJ  i)(x)<cxo  for  every  xfS  and  JkJ.J,  then  T^.  Jk  T^J . 

(b)  If  (T^Jp(x)<  oo  for  every  xtS,  g>0 , and  JktJ.  then  t T^J . 

(c)  If  {Jkl  is  uniformly  bounded,  g is  bounded,  and  Jk — >J,  then  T^.Jk  T^J . 

Proof : 

Assume  first  that  T^J^oo  and  Jk|J.  Fix  x.  Since 

j[g(x,u)  + <*  |j1  (x')t(dx'  !x,u)]^.(du!x)  < 00  , 

we  have 

g ( x , u ) +OC  [j1  (x'  )t(dx'  | x,u)  <00 

for  !*■(  • ix)-almost  all  u.  By  the  monotone  convergence  theorem, 

g(x,u)  +«  jjk(x' )t(dx' !x,u)  l g(x,u)  +«  Jj(x' )t(dx’ !x,u) 

for  ^ (' lx) -almost  all  u.  Apply  the  monotone  convergence  theorem  again  to 
conclude  (T^Jk) (x)  l (T^J) (x) . 

If  T^J^<oo,  g>0,  and  Jk  f J,  the  same  type  of  argument  applies.  If  {Jk) 
is  uniformly  bounded,  g is  bounded,  and  Jk — *J,  a similar  argument  using  the 
bounded  convergence  theorem  applies.  QED 

Lemma  1,5  Let  {Jk}  be  a sequence  of  universally  measurable  functions  from  S 
to  R*  and  a universally  measurable  function  from  S to  C whose  graph  lies  in 

OO 

P.  Suppose  for  some  sequence  {€.k)  of  positive  numbers  with  we  have 


f 


for  every  x£S, 


jj*(x')t(dx'  !x,/*(x))  < +oo, 


limk-»  oo  Jk  = J’ 


and  for  k=2, 3 , . . . 


j ( x ) < Jk(x)  < J(x)  + ek  if  j(x)>-<», 

J„(x)  < Jv,  ,(x)  + if  J(x)  = -oo. 


limk-^c»  VJk  = VJ* 


Proof : 


Since  J<J[.  for  every  k,  it  is  clear  that 


T^J  < lira  infk_^oQTFLJk. 


For  xtS, 


lim  supk_;)00  (T^Jk)(x)  < g(x,/x(x))  + « lim  supk_^o0  /{J>_o0}Jk(x' )t(dx'  |x>/a(x)) 

+ lim  supk_)00J{j  j Jk  ( x ' ) t ( dx ' i x ,^a(  x ) ) . 


lim  sup,,_^  [ Jk(x’)t(dx' !x,^(x)) 

K-^oo  J { j>_ oo } K 


< lim  supk^oc  [[  ^ J(x,)t(dx,i*,^(x))  +&k] 


< ( J(x' )t(dx' !x,u(x)) 

^ { J>-oo} 


If  J(x' )=-oo,  then 
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t 


OO 


Jk(x')  + I fcn  l J(x'), 


n=k+1 


+ oo 

t(dx'  |x,^a(x)  )-almost  surely,  and  since  J [ J ^ (x' )+  £^fcn]t(dx' ! x,  p.(x))<°°  , 
lim  supk_)QOj^  jJk(x')t(dx' ix,j*(x)) 


[Jk(x’)  + Z € ]t(dx'  !x,i*(x)) 
{J=-oo}  k n=k+1  n r 


— l-'-rak— ^ooj 

= 1 J(x' )t(dx' !x, u(x)) . 

J{J=-oo}  r 


It  follows  that 


lim  supk_^  ^ T^Jk  < T^J.  QED 

The  dynamic  programming  algorithm  over  a finite  horizon  is  executed  by 
beginning  with  the  identically  zero  function  on  S and  applying  the  operator  T 
successively  N times.  The  next  theorem  says  that  this  procedure  generates  the 
optimal  cost  function.  In  Corollary  3-2.2,  we  show  how  e-optimal  policies  can 
also  be  obtained  from  this  algorithm. 


Theorem  2^2  (F+)(F*) 

Let  JQ  be  the  identically  zero  function  on  S.  Then  J*=TNJQ. 


Proof: 


For  any1t  = (^0,...,^N_1)tTT  , K<N, 


(3-6) 

Jk,tt  = (Tf»0" 

> (T 

ro 

> tkJ0, 


where  the  last  inequality  is  obtained  by  repeating  the  process  used  to  obtain 
the  first  inequality.  Infimizing  over  preTT  when  K=N,  we  obtain 
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If  (Sw)  is  summable  below,  then  by  Lemma  5.5,  (TkJ0)>-o>o,  k=1,...,N.  For 
£>0,  there  are  universally  measurable  selectors  — >C,  k=G,...,N-1,  witn 

^kUh\\  and 

-oo<Ts  (Ik"1Jn)(x)  < (TkJn)(x)  + 6/n,  k=  1 , . . . , N , 

Ah-k  ° ° 

for  every  xtS  (Theorem  2.5).  1 hen 

(3.?)  J0)  < (T^,...1^)(TJ0  - 6/») 


- >0>r  *w  ° 

< (T~  1-  . ..T-  )d2J,  +e/N  +o.£/n) 

~ h0  h\  /* n-3  0 


+ e> 


where  the  last  inequality  is  obtained  by  repeating  tne  process  used  to  obtain 
the  first  two  inequalities.  It  follows  that  J^<.riNJ0. 

It  (on)  is  summable  above,  then  JhjTT(x)<oo  for  every  xtS.-neTT  , 
K=1,...,N.  by  (3.b),  XKJ  (x)<oo  for  every  xtb,  K=1,...,w.  Use  Theorem  2.5  to 
choose  nonrandomized  policies  it1 =(»*£, . . . )eTT  such  that  f or  every  xtb, 

T iriN_k_1J0)U)  < °°  , k=0, . . . ,h-1 , i= 1 ,2 , . . . , 

A^k 


T i(iN'k_1J0)  ^ 1N_KJ0,  k=u, . . . ,W-1 , 


as  i — » oo  . T hen 


inf  (i 
llo’ 

i )U  i • 

• • • ' 1N  - 1 ; lJO 

•o 

..1  i )(J0) 

r N-1 

inf*  .. 
0 

.infi  (T  * .. 

™“  1 IX  O 

'o 

•T  i )(J0) 

u.- 1h-1  0 

'n-  1 

inf  j . . 
0 

• infi  (1  i • • 

1h-2  J-o 
' 0 

• T * ) Linf  j 

n-2  1i 

r N-2 

y-  "7 
' N-1 


b'd 


= ini , . . . inf-  (i  c . . .1  , mdn) 

O N— £ u.  O fA-h-C 


ti-a 


- iNj 

- 1 “ o ’ 


where  the  last  equality  is  obtained  by  repeating  the  process  used  to  obtain 
tne  previous  equality,  Ofcb 

corollary  (t’+)(f_) 

■M 

1 he  function  is  lower  seraianalytic . 

Corollary  S.^.2 

U?  + ) r'or  each£>0,  there  exists  a nonrandomizeo  Marnov  t-optimal  policy. 
fr1-)  f or  eacn  £ >u , there  exists  a nonrandomizeo  seni-narttov  £ -optimal 
policy  and  a (randomized)  warxov  e-optimal  policy. 


froof : 

if  (Sih)  is  summable  below,  then  the  policy  (£.Q , . . . ) constructed  in 

the  proof  of  ineorem  j.2  ist-optirnal,  nonrandomizeo  and  narKov. 


Assume  (Sh)  is  summable  aoov-s.  We  snow  first  tne  existence  of  an 
fe-c^timal  nonrandomizeo  semi-marKov  policy.  Let  ) oe  as  in 

the  proof  of  Iheorem  p.2.  i hen 


0 N = A0  = int(i0, 


•’W  J 


n ,•  . 1 ( 


JA  0 _ 


1,.  1 ' ' O' 
N-1 


= inf 


. JT^O  ’ ' 

where  71 


N-1 


( V . I’l  ™ I \ 

= (//o  ’ • • • '/'h-l  J 


•’^-l^  h.-Jt^o’  ' ' ' » ’n - 1 ^ ’ 
Choose  6 >0  and  define 


S(x)  = J*(x)  +6  if  J“(x)>-o®; 

= -1/e  if  J*(x)=-oo. 

( i o » • • • » iw  _ 1 ) 

Order  linearly  tne  set  (tr  : i , ...,i  1 are  positive  integers) 


■Ji 


and  define  rr(x)  to  be  tne  first  n 


U, 


sucn  that 


J M , dx)  < bix). 

im  ,ir  1o’  ' • • ’ ^w-l ' 

Let  the  components  of  n(.x;  be  ^o  (du0!x),/x1ldu1!x,x1),  ...  Idu^, ! x , xw-1 ) ) . 

( in»  • • • i _ i ) 

ine  sets  tx:  7t(x)  = 7X  u j are  universally  aieasuraole  lor  eacn 

u0,  - - - , iN_i ) . so  (^(dUQixJ.^^du^x.x,;,  ...  ,^N_1Uu1N|_i:x,xh_1))  is  an  t - 
optimal  nonrandomized  semi-Harkov  policy. 


we  nov.  ow  tne  existence  of  an  fc-optimal  (.randomized ) warkov  policy,  by 
Lcmia  3.1,  tnere  exist  t li(L|S)  sucn  that  for  k=1,...,w 

y kriK_1jo)  1 + e/l'  • 

Proceed  as  in  (3-7).  Qau 


If  (Sh)  is  sommab’p e above  and  £>u,  it  may  not  be  possible  to  fina  an 
t-optimal  nonrandomized  Harkov  policy,  as  the  following  example  demonstrates. 

bx ample  3.1  Let  S=  10,1,2,...},  C=t1,2,...},  Visfw^w^},  r=SC,  h = 2 and  define 

g(x,u)  = -u  if  x=1 , 

=o  if  x/1 , 

f(x,u,w)  = 0 it  x=0  or  x=1  or  w=w1 , 

= 1 if  x/0,  x/1  and  w=w^, 

p(, t w ^ } I x ,u ) = 1 - 1/x  if  x*0,  ::i1, 

; :,u)  = 1/x  if  x*0,  x^1  . 

Let  rt  = C^*0,^1 ) be  a nonrandomized  Harkov  policy.  If  tne  initial  state  xQ  is 
neither  zero  nor  one,  tnen  regardless  of  tne  policy  employed,  x.,=u  with 
probability  1-(1/xQ),  and  x^1  with  probability  1/xQ.  Once  the  system  reaches 


zero,  it  remains  there  at  no  further  cost.  If  the  system  reaches  one,  it 


moves  to  x^=0  at  a cost  of  -yn^l).  Thus 

JN,nUo)  = -/'l(1)/xo  for  V0'  V1  * 

and 

J^(x0)  = - eo  for  xQ^0,  xQ*1. 
for  any  €>0,  Tt  cannot  be  ^-optimal. 

In  Example  3.1  it  is  possiole  to  find  a sequence  of  nonrandomized  iviarxov 
policies  {nn}  such  that  Jw  n This  example  motivates  tne  idea  of 

policies  exhibiting  {fenJ  dominated  convergence  to  optimality  (Definition  3.b) 
and  tne  following  corollary. 

Corollary  3.2.3  (f“) 

Let  {fcn}  be  a sequence  of  positive  numbers  with  0.  Ihere  exists  a 

sequence  of  nonrandomized  Harkov  policies  i»rn } exhibiting  {tnJ  dominated 
convergence  to  optimality. 

Proof : 

For  N=1,  by  Theorem  2.5  there  exists  a sequence  of  nonrandomized  Markov 
n 

policies  ) for  which 

(T  J )lx)  < (TJ  ) (x)  + €n  if  (TJ  )(x)>-oc; 

^o 

< -1/fen  if  (TJ0)(x)=-oo. 

We  may  assume  without  loss  of  generality  that 

(T  nJQ)(x)J.  -oo  if  (TJ0)(x)  = -oo. 
f*Q 

Therefore  {7rn } exhibits  (€n)  dominated  convergence  to  optimality. 
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r 

n n 

Suppose  the  result  holds  for  N-1.  Let  rrn=  ( /» -j » • • • ) &e  a seouence 

of  (N-l)-stage  nonrandomized  Markov  policies  exhibiting  {£n/2«}  dominated 
convergence  to  optimality,  i.e. 

» 

JN-1  ,TTn  = JN-1  , 

(3.8)  ^N— 1 it  (^^  ~ ^N-1  ^ x ^ + ^ ^N-1  ( x^" 00 » 

and 

(3.9)  JN-1  ,-nn(x^  - JN-1  ,Trn_/X^  + €n/2<*  lf  JN-1(x)  = "°°- 

oo 

We  assume  without  loss  of  generality  that  ]T  € <°°.  By  Theorem  2.5,  there 

n=  1 n 

exists  a sequence  {^.n}  of  universally  measurable  functions  from  S to  C whose 
graphs  lie  in  P such  that 

(3.10)  {1pnJN-1)(x)  - JN(x)  + €n/2  if  JN(x)>-°°; 

< -2/en  if  jJJ(x)  = -oo. 

We  may  assume  without  loss  of  generality  that 

(3- 1 1 ) Tj*nJN-1  - T(*.n-1JN-1’  n=2,3’ 

By  Theorem  2.4,  the  set 

A ( Jjj-1  ^ = Kx,u)iP:  t(jJJ_1=-oo|x,u)>0) 

= ((x,u)tr:  « (x' )t(dx' I x,u)<0} 

is  analytic  in  SC,  and  von  Neumann's  Lemma  (Theorem  A. 6)  implies  the  existence 

ft  ft 

of  a universally  measurable j*  : pro j§A(JN_1 ) — >C  whose  graph  lies  in  A(J^_1). 
Define 

^n(x)  = y* ( x ) if  xtprojsA(J*_1 ) ; 

= y in(x)  otherwise. 


J 


Tnen  Kn  = (^n,^n)  is  an  N-stage  nonrandomized  harxov  policy  which  we  will  show 
extiibits  {eni  dominated  convergence  to  optimality. 

for  xeproj^At  ) , we  nave 

lim  supn  _^oo  Jfj  (itn  ^x ) = liro  suPn  — » a°  ^V-^N-1  ,Jrn^x^ 

= ll^J^_i)(x)  by  Lemma  3-5 
= - oo  Dy  cnoice  of/*-. 

tor  x^projgA(  ) , we  have  for  every  utTx 

t (J  -j  =~  i x , u ) - 0, 

so  by  (3-0), 


(3.12) 


JN,nnU)  = (1/>nJh-1  ,Itn)(x) 


^ ^nJN-1)(x)  * fcn/2> 


lim  supn_>t>0JN(-  (x)  < lim  supn_*  « (T  n«3^_i ) (x ) 


< JWU) 


by  (3-1b).  It  follows  that 


(3-13) 


limn-»oo  JN,nn  = 


Suppose  for  fixed  xtS,  we  have  J^(x)>-oo.  'ihen  xfproj^Al J^_i ) and  we 
nave  from  (3-10)  and  (3.12), 


(3-14) 


JN,nn^  — ^^nJM-1^x^  + ^n/2 

< jj'(x)  + €n  if  J*(x;>-o°. 


1 
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Suppose  now  tnat  J^(x)  = -oo.  If  x^proj^A( ) , then  (3.11)  and  (3.12)  iraply 
1 or  n >2 , 


jw,k  (x)  + tn/2 


>n°N-1' 


< n-1JN-1,lx)  + £n/2 

r* 


* (Vn-lJN-ll,rn-1)(X)  + 


1 JN.in.iU>  + V2’ 


while  it  xtproj^A(dw_1 ) , we  have  from  (3.d)  and  (3.9) , 


J«  * (x)  = ( VN_1fjr  )(*) 


'".♦n 


- ( VJN-1,nn_ 


) (x)  + €n/2 


In  either  case, 


(3-15) 


= - (x ) + 6 /2  . 

N»"n-1  n 


JN,ftn  1 + €n- 


from  (3-13),  (3-14)  and  (3.15)  we  see  that  ItL}  exhibits  {£  } dominated 
convergence  to  optimality.  QfcD 

We  oonclude  this  section  with  a tecnnical  result  needed  for  the 
development  in  Chapter  b. 


Corollary  3.2.4  (F+)(f~) 

for  every  ptP(S), 


Jj^(x)p(dx)  = inf^  iJh  )JT(x)p(dx) 


Proof: 


A 


DO 

r or  ptllo) , 

jj*U)p(dx)  < J«JN)nU)p(ax) 
tor  every  tteTT  , rtiich  implies 

jj^tx)p(dxj  < inf  ^ en  P N)11(x;ptdx;. 

Cnoose6>u  ana  let  cTT  oe  £-optimai.  then 

Jj^>w(.x)pldx)  1 jj’u)p(dx)  + £ if  ptx:  d*u)=-ooj=u; 

<.  -ptx:  = 

+ f „ J*(.xjp(ux)  + 6 it  pix:  j“(x;  = -ooj>u; 

MJ,, >-<*>}  ft 

and  tne  reverse  inequality  lollops.  Wtu 

oe*t_cr^  d-_  u V-ark. 

.•atr-  :U;  . j..  . • , . ‘e  a narkcv  policy  and 

-nSt,^  , . . . ) , k=  1 , . . . ,1. . Tne  policy  jr  is  uniformly  iiz&taEfi.  aatiraai  it 

J k = Jk»  K=1  > * * • ’'*■ 

KfnK 

Lemma  -i . o 

The  policy  k )fTf  is  uniformly  iM-stage  optimal  it  and  only  if 

(f  J ) - i °^o? 

INi  — K. 

froof : 


If  is  uniformly  n-stage  optimal,  tnen 
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If  T„  (Tk_1Jn)  = TkJ  , k= 1 N,  then 

r m_ 


N-k 


= Vs.klTk'1jo> 


= Tu  Tu  (Tk_2J, 


Tn-k  J*N-k+1  0 


= (T.  ...TM  HJ0) 

f N-k  nj_i  0 


= J k,  k=1,...,N, 
k,nK 


where  the  next  to  last  equality  is  obtained  by  continuing  the  process  used  to 
obtain  the  previous  equalities.  QED 


Theorem  3.3  (F+)(F~) 

If  the  infimum  in 

infut p tg(x,u)  + * jJ*(x' )t(dx' !x,u)} , k=0,...,N-1, 

is  achieved  for  each  xeS,  then  a uniformly  N-stage  optimal  (and  hence  optimal) 
nonrandomized  Markov  policy  exists.  This  policy  is  generated  by  the  dynamic 
programming  algorithm,  i.e.  by  measurably  selecting  for  each  x a control  u 
which  achieves  the  above  infimum. 


Proof: 

Let  ft  =(p.QJ . . . ,pN_1 ) , where  :S achieves  the  infimum  above  and 
satisfies  rx,  for  every  xtS,  k=0,...,N-1  (Theorem  2.5).  Apply 
Lemma  3-6.  QED 

We  now  make  continuity  and  compactness  assumptions  on  (SM)  which 
guarantee  the  hypothesis  of  Theorem  3-3  is  satisfied  and  consequently  a 
uniformly  N-stage  optimal  nonrandomized  Markov  policy  exists.  It  follows  as  a 
by-product  of  these  assumptions  and  the  selection  Theorem  2.7  that  this  policy 


00 


can  be  chosen  to  be  borel  measurable. 


(j.lo)  C is  compact. 


lo.l/)  r = UI  J,  where  r'cT^C  — , each  P^  is  a closed  subset  ol  oC,  and 
J = 1 


lim,  _»ooinf  . , ,g(x,u)  = +00.1 

J (x,u)£  rJ-  PJ_1 

(o-1°)  g is  lower  semicontinuous  and  bounded  below  on  P. 


(3-1y)  t(dx'ix,u)  is  continuous  on  P . 


.ineoreai  j.'j  Let  (3-1o)  - (3-15)  hold.  11  J:3— is  lower  semicontinuous 
and  oounded  below,  then  so  is  TJ  and  there  is  a borel  measurable  lunction 


such  that  ^.(x)e  Px  and 


i)(.x)  = gU,/*.(x))  + <xp(x')t(dx' |x,/t(x)) 


lor  every  xCS. 


Proof: 


The  lunction 


(x  ,u ) > g 1 x, u ) +«jj(x ')t(dx' ix,u) 

is  lower  semicontinuous  on  P (Theorem  b.b  and  Lemma  2.5(a)).  Define 

n(x,u)  = g ( x , u ) + ap(x')t(dx'!x,u)  if  (x(u)£T  ; 

= + o°  otherwise. 


for  ctrt,  (3*1  ()  and  the  lower  boundedness  of  J imply  tne  existence  oi  some  k 
such  that 


by  convention  the  infimum  over  tne  empty  set  is  +00.  Thus  we  allow  the 
possibility  that  for  some  k,  Pk=  rk+1=  pk+k _ 


inf  .:H(x,u)  > c. 

u,u)e  U r° 

j=k+1 


Therefore  {(x,u):  H(x  ,u)<.c)  ={(x,u) : g(x  ,u)+afj(.  x ' )t  (dx  ’ ; x , u)<c)  is  a subset 

ol  rK  and  closed  in  PK,  therefore  closed  in  SC.  it  follows  that  h is  lower 
semicontinuous  on  SC. 


by  Tneorem  ^.b,  (1J  ) (x  ) = minuf  ^.hfx  ,u)  is  lower  semicontinuous.  by  Iheorem 
d.l,  tnere  exists  a borel  measurable  ^ :S  — »C  such  that  (1J ) lx j = h( x (x ) ) lor 
every  xtb.  By  Lemma  2.p,  tnere  exists  a borel  measurable  — sucn  that 

(x,^(x))e  P lor  every  xCS.  Let 

/*(x)  - i ( x ) if  (TJ)(x)<o°; 

= otherwise. 

Then  ^ has  the  desired  properties.  Qbb 

We  note  that  although  ( 3 . 1 o ) requires  the  compactness  of  C,  the  results 
just  proved  all  hold  for  nonoompact  C if  tne  sets  P^,  j = 1 are  compact. 

Ihis  is  true  because  C can  De  homeomorphically  e»noeddea  in  a compact  borel 
space  C Lilt  Chapter  lx,  Corollary  9.bJ  and  the  images  of  j=1 ,2,...,  are 
compact  in  SC. 

It  is  also  possible  to  give  simple  sufficient  conditions  on  g' , f and  p 
in  Definition  3.1  to  insure  that  (3.1e)  and  (3.19)  nold. 

iheorem  3 . b Suppose  tnat  p(dw]x,u)  is  continuous.  If  g'  is  bounded  below 
and  lower  semicontinuous,  then  g is  also.  If  f is  continuous,  then  t(dx'|x,u) 
is  also. 


broof : 


If  p is  continuous  and  g'  is  bounded  below  and  lower  semicontinuous, 
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Theorem  2.0  implies  that  g is  also.  by  lheorem  b.1,  the  stochastic  Kernel  t 
is  continuous  if  and  only  il  for  every  t'fC(X),  tne  function 

U,u)-^»  jr'U  ' )t  (dx'  i x,u) 

is  continuous,  oy  definition 

Jr (x ' )t (dx1 1 x,u)  = JtLf (x ,u,w) Jp(dw| x ,u) , 

and  the  conclusion  follows  from  Lemma  2.1.  QbD 

Lote  that  if  to  is  n-dimensional  Euclidean  space  and  the  distribution  of  w 
is  given  by  a density  dlw;x,u)  which  is  jointly  continuous  in  (x,u)  for  fixed 
w,  then  p(dw|x,u)  is  continuous.  To  see  this  let  G be  an  open  set  in  to  and 
(xn,un) — » fx,u)  in  SC.  Then 

lim  infkp(G|  xk,  uR)  = lim  infk  J d(w;xk, uk)dw  >_  j d^w;x,u)aw  = p(G|x,u) 

oy  fatou's  Lemma.  The  continuity  of  p(dw|x,u)  follows  from  Theorem  fa.1. 

in  fact  it  is  not  necessary  that  d be  continuous  in  (x,u)  for  eacn  w,  out 
only  that  (xn,un) — ) (x,u)  imply  d(w;xn,un) — »d(w;x,u)  for  Lebesgue  almost  all 
w.  tor  example,  if  to=K,  the  exponential  density 

d(w;x,u)  = e~^w-m^x,u^  if  w>mfx,u); 

= U if  w<m(x,u), 

where  m:SC  — >R  is  continuous,  has  this  property. 


ChAPlbh  4 
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Thb  llMfllMi'ib  hUhibUlM  1‘iUDtLb  AND  'lhtlK  HbLAi  IblMbhlPS 


section  K.  itifc 


model 


T de  stochastic  moael  (bn)  considered  in  tnis  cnapter  is  tne  same  as  given 
by  Definition  p.l,  except  that  tne  norizon  w is  +co.  'ine  entire  discussion  o f 
chapter  p,  bection  1,  including  iheorem  p.l,  applies  to  the  infinite  norizon 
moael  except  that  in  place  of  the  concepts  of  summability  below  and 
summability  above  we  have  tne  assumptions 

ir;  g’(x,u,w)  >_  o for  every  xtb,  ucc,  wcw; 

(1m)  g'(x,u,w)  <.0  for  every  xto,  uiC,  wtw. 

We  consider  additionally  the  discounted  case 

(D)  -d  <_  gftx,u,w j <_  b < for  every  xtS,  ucC,  wiW;  0 < « < 1- 

Any  one  of  these  assumptions  guarantees  the  convergence  in  Hw  of  the  sum  in 
(p.4)  wnen  is.— > co  . All  results  in  Chapters  4 and  b hold  in  at  least  one  of 
the  cases  (P),  (im)  or  (b)  and  these  letters  will  appear  in  the  statements  of 
the  results  to  indicate  which  are  applicable. 

we  will  also  implicitly  assume  that  under  (P)  any  function  J :o  — > H* 
actually  takes  values  only  in  LU,  + o°J  ; under  (hi),  <i  takes  values  in  L-°o,uj; 
and  under  (0 ) , J is  bounded.  One  can  check  that  these  properties  are 
preserved  by  tne  operators  1^.  and  T (Definitions  p.f  and  3.o).  by  making 
these  assumptions  we  ensure  that  the  integral  operates  linearly. 
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Wien  the  norizon  is  infinite,  we  omit  the  subscript  ft  in  the  functions 
J1U  and  J* . in  terms  of  the  operator  lu,  we  have  in  place  of  equality  in 
(d*b)  for  rr  = (^0,^1 , . )c  IT  , 

Jrr  = ^ V0’ * ’ ^n-i  ’ 

li  = . . . KTT  has  all  components  the  same,  we  say  n is  stationary  and 

write  Jp.  in  place  of  Jr* 

section  ihe  deterministic  model 

we  now  describe  the  deterministic  decision  model  (Dm)  corresponding  to 

(Sh) . 

Definition  4.1  Let  (S,C,r,w  ,p,  f ,<*,g ' ,w)  oe  a stochastic  decision  model  as 
described  by  Definition  3*1,  let  ft=oo,  and  let  t be  the  transition  kernel 
defined  by  13.1)  and  g the  (reduced)  one-stage  cost  function  defined  by  (q.k). 
Ihe  corresponding  deterministic  model  l Dm)  consists  of  the  following: 

PCS) : State  space . 
tHSu):  Control  space . 

P(n  = lq£p(hC) : q(P)  = 1}:  Constraint  set.  ForptP(X), 

P(  Dp=tqtP(D : The  marginal  of  q on  b is  p. } 

f:  System  function.  f:P(SC) — »P(S)  is  defined  by 

f(q)(£)  = f t(&|x,u)q(d(x,u)),  3,. 

J SC  ° 

c*,g:  Discount  factor  and  one-stage  cost  function . we  treat  the  cases  IP), 


( ft ) and  ( D ) . 


ob 


by  corollary  ts.o.1,  b(£>),  b(C)  and  bibb)  are  borel  spaces, 
b.o,  e(  n=  tqEb(bb) : q(Di.H  is  an  analytic  subset  of  b(bC).  by 

bo.t  and  b.d.3,  f is  borel  measurable.  The  function  q— *Jg 
semianalytic  on  r(oc)  by  uorollary  2.4.1. 


by  Theorem 
Corollaries 
aq  is  lower 


Definition  4.2  A policy  in  ( Um ) is  a sequence  of  mappings  'ff=(/‘0>pi  > • • • ) 
such  that  for  each  k,  ^k:F(b)— » b(bC)  and  /^(p)  £ P(  D for  every  ptr(b). 
Ine  set  of  all  policies  in  (Dh)  will  be  denoted  by  TT . We  place  no 
measurability  requirements  on  these  mappings.  If  n=(^,^, . . . ) nas  ail 
components  tne  same,  n is  said  to  be  stationary. 

Definition  4.  -i  Given  pQeP(b)  and  a policy  n in  (Dh),  the  cost  function 
corresponding  to.  n at  pQ  is 

(4.D  3*lP0>  = kVkJg  Qq^ 

where  the  q^'s  are  generated  recursively  by 

(4.2)  Ik'W' 

(4.3)  ^k+1  = k=0 , 1 , . . . . 

If  -n  = (p,jL, . . .)  is  stationary,  we  write  in  place  of  J-.  The  optimal 
cost  function  at  pQ  is 

J*(p0)  = infSf  ^JS(P0). 

ine  concepts  of  £ -optimal  and  optimal  policies  for  (pm)  are  the  same  as  those 
given  in  Definition  3.b  for  (Sm). 


Definition  4.4  A sequence  (p0,q0,q1 , . . . ) t t'(S)P(bUt'(i>C) . . . is  aamiSSiPlfi. 

AH  (Dm)  if  q0£P(Dpo  and  , k=U,1 

admissible  sequences  will  be  denoted  by  A . 


The  set  of  all 


00 


The  admissible  sequences  are  just  those  which  can  be  generated  by  some 
policy  via  (A.ir)  and  (**.3).  fcxcept  for  pQ,  the  p^'s  are  not  included  in  tne 
sequence,  cut  tnese  are  the  marginals  of'  the  q^'s  on  5 ana  so  can  be  recovered 
from  tne  sequence. 

Lemma  h. 1 ine  set  of  admissible  sequences  A in  turn)  is  an  analytic  subset  of 

r(b)f(sC)r(6C) .... 

rroof : 

OO 

The  set  A is  equal  to  AH  l f\  b^J,  where 

0 k=U  K 

Ao  = t(p0,q0,q1t...):  Q0tP(P)  }, 

^o 

Bk  = l(p0,q0,qlt...):  <W  ^ r^(qk)  > • 

lo  snow  that  AQ  and  b^,  k=u,1,...,  are  analytic,  oy  Theorem  h.'t.  it  suffices  to 
snow 

A = t(p1  ,qi)tf(b)P(SC) : q^l^P)  } 

and 

B = l ( qQ , q i ) t B( CsC) P ( SC ) : 

are  analytic. 


The  set  A is 

the  intersection 

of 

r(S)P(D  with 

the 

graph  of 

the 

continuous 

mapping 

q — * L marginal  of 

qJ 

from  P(iif;)  to 

?(S), 

and  this 

is 

analytic  by 

T heor  em 

k.'d  and  L^bJ,  Chapter 

1,  Theorem  3.3. 

The 

set  b is 

the 

inverse  image  of  A inder  the  Borel  measurable  mapping 

(qo.q!) — » IflqoJ.q!), 


and  so  is  analytic  (Theorem  A.3).  Qhu 


O'/ 


Aeolian,  i*.  neiations  between  the  moo  els. 

i tie  deterministic  model  lends  itself  to  a simpler  analysis  tfian  does  tne 
stochastic  model  and  Chapter  p is  devoted  to  tnis.  tor  the  analysis  to  oe 
meaningful,  however,  it  is  necessary  to  estahlish  correspondences  Detween  (omj 
anJ  (bh)  that  permit  transfer  of  results  from  one  to  the  other.  Tnis  is  tne 
goal  of  this  section. 

definition  4.5  Let  n =(j*.Q  , . . . ) tTT  oe  a policy  in  (DM)  and  tr=(^0,^1 , . . . ) 

a policy  in  (um)  . Let  pQ  be  given.  If  tor  each  £.£G>q, 

(4.4)  {/k(llx)plt(dx)  = /IK(pk)  C^it) , 

where  the  p^s  are  generated  from  pQ  by  « via  (4.2)  and  (4.3),  then  n ana  tt 
correspond  at  If  tt  and  it  correspond  at  every  pct,(5),  then  Tt  and  tt 

correspond . 

I f tt  and  n correspond  at  pQ,  tnen  the  sequence  of  measures  (q0,q-|,...) 
generated  from  pQ  by  tt  via  (3-d)  is  the  same  as  the  sequence  generated  from  pQ 
by  dr  via  (4.  a)  and  (4.p).  If  tt  and  ^correspond,  tnen  tney  generate  the  same 
sequence  (q,qi*...)  tor  any  initial  pQ. 

Theorem  4 . 1 fr) (h) fb) 

If  fttTT  , there  is  a corresponding  tttTt.  If  n £ TT  and  pQtP(S)  is  given, 
there  is  a policy  ttiTT  corresponding  to  tt  at  pQ. 

Froof : 

In  the  first  case,  let  ir=(po,  , • • • ) , where  is  chosen  to  satisfy 
(4.4)  identically  in  p^.  In  the  second  case,  let  tt  = (^0,/^  , . . . ) , where  is 
chosen  to  satisfy  (4.4)  wnen  the  PK's  are  generated  from  pQ  by  tr  via  (4.2) 
and  (4.j).  buen  y*K's  exist  by  Theorem  C.l.  wt.b 
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Theorem  4,2  (P)(m)(D) 

It  mTT  and  n e TT  correspond  at  p,  then 


|jn(x)p(dx)  = J^(p) 


Proof : 


Let  (qQ,q1,...)  be  generated  from  p by  it  via  (3-3)  or  from  p by  ti  vie 


(4.2)  and  (4.3).  Then 


J*(P)  = dC,k  = K(x)?(dx)’ 


where  the  monotone  or  bounded  convergence  theorem  is  used  to  interchange 
integration  ano  summation  for  the  second  equality.  QED 


Corollary  4,2.1  (P)(N)(D) 

If  tteTT  and  TreTT  correspond  at  px,  then  J^tx)  = J-(px), 


4,2.2  (P) («)(£» 


For  every  xeS  , J*(x)=J*(p  ). 


Proof : 


This  is  an  immediate  consequence  of  Theorems  3-1  and  4.1  and  corollary 


4.2.1.  geo 


4 , b Let  J:P(S)— »R*  and  p.:  P(S) — »P(SC)  be  such  that 


p(.p)  £ P ( P ) p for  every  ptP(S).  The 

T^J:P(S) — » R*  is  defined  by 


X^.  mapping  J into 


(TpJ)(p)  = jg  d/l(p)  + <*  JLf(/l(p) ) ] 


for  every  ptP(S).  The 


X mapping  J into  TJ:P(S) — * R*  is  defined 


by 


UJ)(p)  = inf  ^p)  tjg  dq  +<*  JLf (q) J) . 


we  implicitly  assume  tnat  under  (f)  any  function  J:r(.S)— >rtw  actually 
taxes  values  only  in  10,  + ooj  ; under  (10,  J taxes  values  in  L-°°,0J  ; and 
unaer  (u),  J is  bounded.  One  can  verify  that  these  properties  are  preserved 
by  the  operators  Tp.  and  T. 

ihe  next  tneorem  is  a direct  consequence  of  tne  definitions. 


Let  j:o— *k»  oe  universally  measurable  and  J(p)  = jjap. 


suppose 


— =>r(SC)  is  of  the  form 


/4p)(ii£)  = / /*■(£.!  x)p(  ax),  iiE <8 ^ 


for  some  universally  measurable  stochastic  Kernel  ^cu(C|S).^  Then  ^(p)  t f(r)p 


U-JUp)  = J ( i^J ) dp 


for  every  ptrib). 


ineorem  (P)(w)(D) 


Let  be  lower  semianalytic  and  J(p)=Jjdp.  '1 

(T3) Cp)  = { ( TJ ) dp 


for  every  ptiHS). 


froof : 


ror  q£.r(r)p, 


The  set  U(C|S)  is  defined  in  Chapter  3,  Section  2. 
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Jg  dq  + «J[f(q)]  = J[g(x,u)  + <*  Jj(x  ' )t  (dx'  !x,u)  jq(d(x  ,u) ) 

2 j ( TJ ) (x)p(dx) , 

which  implies 

(TJ) (p)  2 J ( TJ ) dp . 

Given  ptP(S)  and  £>0,  by  Theorem  2.5  there  is  a universally  measurable 
selector  ^ :S  — * C such  that  (x,/*(x))tr  for  every  xtS  and 

g(x,^(x))  +«  |j(x  ' )t(dx* !x,^(x) ) 

< (TJ ) (x ) +£  if  (TJ)(x)>-<*; 

<.  -( 1+62)/e  p{x:  (TJ)(x)  = -oo)  if  (TJ) (x)  = - oo  and  p{x:  (TJ ) (x  )=-oo}  >0 . 

Let  qtP(Dp  be  defined  by 

q(££)  = | P„(x)(£)p(dx) , s,  £.*6  c. 

£ ' 

Then 

jg  dq  + <*  J[  f(q)  3 A J(TJ ) (x)p(dx)  + £ if  p{x:  (TJ ) (x ) = -°°}  =0  ; 

1 -1/£  if  p{x:  (TJ)(x)=-co}>0. 

Therefore  (TJ) (p)i|(TJ )dp.  Q£D 

Corollary  4.2.2  has  shown  that  J*  and  J*  are  related,  but  in  a rather 
weak  way  that  involves  J*  only  on  S={pxtP(S):  xtS) . In  Theorem  4.b  we 

strengthen  this  relationship,  but  in  order  to  state  that  theorem  we  must  show 
a measurability  property  of  J- . 


Theorem  4.5  (P)(h)(D) 

The  function  J*  is  lower  semianalytic  on  P(S). 
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Proof: 

Define 

o° 

G(p0,q0,q1,...)  = I«kJg  dqk  if  (po>qo’ • • • A > 

= + oo  otherwise. 

Then  G is  lower  semianalytic  on  P(S)P(SC)P(SC) . . . by  the  remark  following 
Definition  4.1  and  by  Lemma  4.1.  By  Theorem  2.3, 

J*(p0)  = inf(p0,q0, . . . ) t P(S)P(SC) . . ,G(po’qo’q1 ’ ' • ‘ J 
is  lower  semianalytic.  QED 

Corollary  4.5.1  (P)(N)(D) 

The  function  J*  is  lower  semianalytic  on  S. 

Proof: 

By  Corollary  B.7.1,  0:x — >px  is  continuous  from  S onto  S.  By  Corollary 
4.2.2,  J*(x)=J*(0(x) ) , and  so  {x:  J*(x)<c}=0-1  {p:  J*(p)<c}  is  analytic 

(Theorem  A. 3) • QED 

Lemma  4.2  (P) (N) (D) 

Given  ptP(S)  and£>0,  there  exists  a policy  jr  in  (DM)  such  that 
(P)  (D)  J-(p)  < J«J*(x)p(dx)  + £; 

(N)  J-(p)  < jj*(x)p(dx)  + £ if  p{x:  J#(x)=-oo}=0; 

< -1/e  if  p{x:  J«(x)=-oo}>o. 

Proof: 

Let  piP(S)  and  £ >0  be  given.  Let  G be  as  defined  in  the  proof  of 


J 


Theorem  4.5. 
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Under  (P)  and  (D),  by  Theorem  2.5  there  exists  a universally  measurable 
selector  <p:P(S)  — * P(SC)P(SC) . . . such  that  (p,q>(p))s.A  and  G(p,<p(p)  )<JB(p)  + €. 

for  every  p.  Letting  6:x-^px  and  s(x)  = <p(0(x)),  we  have  (px,s(x))t&  and 
G(px,slx) KJ*(x)+fc  for  every  xtS.  By  Corollary  2.2.1,  s is  universally 
measurable . 


Under  (N),  select  s universally  measurable  so  that  for  every  xtS,  we  nave 
(px,s(x))u  and 

G(px,s(x))  < j»(x)  + 6 if  J*(x)>-aa; 

< -(1+t2)/ep(x:  J*(x)=-oo}  if  J*(x)=-oo  and  plx:  J*  (x)=-o°}  >0 . 


Denote  s(xQ)  = (q0( ' | xQ) , q-j ( ' |x0) , . . . ) . Bach  Q^('lx0)  is  a universally 
measurable  stocnastic  kernel  on  SC  given  S (Theorem  C.3)  satisfying 


W|xo)£  P(r)?[qkC!x0)j- 


for  each  xQtS.  Define  q^t  P(SC)  by 


qk(B)  = [qk(6|x0)p(dx0),  Bc«  sc 

Then  q^C  P(D,  k=0,1 ke  show  that  (p  ,qQ,q1 , . . . )t  A . for  ££<8  s, 


k=1 ,2, . . . 


qk(^C)  = j qk(£C!x0)p(dx0) 

s 

= Isf5Ct(^lx*u)<’k-1(d(x>u)lxo)P(dxo) 

= j t(^lx,u)qk_1(d(x,u)), 

SC 


which  implies  qk  £ PCD^^  y For  k = 


V&c>  = lA(xo)p(dxo)  = p(a)- 


73 


Let  f(  be  any  policy  in  (DM)  which  generates  the  admissible  sequence 
(P*Q0>Ql > • • • ) • Then  under  (P ) and  (D) 

_ OO 

J^(p)  = G(p,q0,q1 ,. .. ) = j [ I o(k J g(x,u)qk(d(x,u) !xq) jp(dxQ) 

1C — 0 ^ C 

<.  J G(px  ,s(x0))p(dx0)  i [j*(x0)p(dx0)  + £ , 
o o 

while  under  (N) 

J— ( P ) 1 j G (px  ,s(xQ)  )p(dx0)  <.  [j*  (xQ)p(dx0)  + £ if  p{x:  J*  (x  )=-«=}  =u ; 

O 

1 -1/e  if  plx:  J*  (x  ) = -o°)  >0 . QED 

Theorem  4.o  (P)(w)(D) 

For  every  pep(S),  J*  ( p)  = (j*  (x  )p  (dx) . 

Proof : 

Lemma  4.2  shows  that  J*(p)^.|j*  (x )p (dx) . For  the  reverse  inequality,  let 
p be  in  P(S)  and  ncTT.  Let  rt  be  a policy  in  TT  corresponding  to  n at  p. 
Then  Dy  Theorem  4.2, 

J^(p)  = [jTT(x)p(dx)  2 jj* (x )p (dx) , 
and  infimizing  over  5t  z TT  the  theorem  follows.  QED 
Corollary  4.b. 1 (P)(h)(D) 

Suppose  ncTT  and  rrsTT  are  corresponding  policies  for  (SM)  and  (Dm).  Then  n 
is  optimal  if  and  only  if  if  is  optimal. 

Proof : 

If  Tt  is  optimal,  then  for  ptP(S), 


J-(p)  = j<J„lx)p(dx)  = jj*(x)p(dx)  = J«(p). 


If  TTis  optimal,  then  for  xeS, 

Jr,(x)  = J*(px)  = J#(px)  = J*(x).  y bU 

We  conclude  with  a technical  result  needed  for  Chapter  b. 

Corollary  (p)(n)(d) 

For  every  pt.P(S), 

Jj-lx)p(dx)  = infwlTfjw(x)p(dx). 

Froof : 

By  Theorems  4.1  and  4.2, 

[jn(x)p(dx)  = «J*(p). 


Apply  Theorem  4.b.  QELi 
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ChAPTER  5 

MAIN  RESULTS  - INFINITE  hORIZON 
Section  Introduction 

In  this  chapter  we  treat  cases  (P),(N)  and  (D)  of  Chapter  4.  Due  to  the 
infinite  horizon,  these  conditions  have  been  imposed  on  the  one-stage  cost 
function  and/or  the  discount  factor  to  insure  that  the  cost  function  is 
well-definea  in  R*.  because  of  these  conditions,  it  was  possible  to  establish 
strong  connections  between  (Sm)  and  (DM),  in  particular,  Theorems  4.1  - 4 .4 
and  4.b.  These  connections  will  now  be  exploited. 

Since  we  refer  frequently  to  Bertsekas  [4]  in  this  chapter,  we  begin  by 
pointing  out  how  (Dm)  satisfies  assumptions  made  in  that  paper.  This  can  be 
done  because  there  are  no  measurability  restrictions  on  the  policies  in  (Dm). 

Bertsetcas  considers  a mapping  h.  In  our  case  the  arguments  of  H are 
piP(S),  qtP(SC)  and  J:P(S) — 1R".  We  define 


H(p,q,J)  = jg  dq  + <*  J[f(q) ) , 


and  then  T^  and  T as  given  in  our  Definition  4.b  correspond  to  Bertsekas'  T^. 
and  T . 


It  is  easily  verified  that  for  h thus  defined,  bertsekas'  monotonicity 
assumption  holds.  Furthermore,  taking  J to  be  identically  zero  in  his 
definitions  of  the  value  function  corresponding  to  a policy  and  the  optimal 
value  function,  we  obtain  our  definitions  of  the  cost  function  corresponding 
to  a policy  and  the  optimal  cost  function  given  in  Chapter  4.  Our  case  (D) 
corresponds  to  his  contraction  assumption,  our  case  (P)  to  his  uniform 
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increase  assumption,  and  our  case  (N)  to  his  uniform  decrease  assumption,  his 
additional  assumptions  1.1,  1.2,  D.1  and  D.2  are  satisfied  in  the  appropriate 
cases  in  our  model. 

rtesults  similar  to  those  of  (M]  and  adequate  for  our  purposes  can  be 
found  in  bertsekas  £ 3J , Chapters  o and  7.  We  will  give  both  references 
whenever  possible. 

Section  2 . The  optimality  equations  and  characterizations  of  optimal  policies 

Theorem  5.1  (Optimality  equations)  (P)(N)(D) 
we  have 

J«  = TJ",  J*  = TJ*. 

Proof : 

This  holds  for  (DM)  by  L4],  Propositions  1,  5 and  6 or  by  13],  Chapter  b, 
proposition  2,  and  Chapter  7,  Proposition  1.  for  (SM)  we  have  for  each  x£S, 

J*(x)  = J*(px)  = (TJ*)(px)  = (TJ«)(x) 

by  Theorems  4.4,  4 .fa  and  Corollary  4.5.1.  QED 

Theorem  5,2  (P) (N) (D) 

If  rt  = (p,jl, . . . ) is  a stationary  policy  in  (DM),  then  If 

n=(^,f», . . . ) is  a stationary  policy  in  (SM),  then  Jp=TpJp.. 

Proof : 


This  follows  for  (DM)  by  [4],  Proposition  1 and  Corollaries  5.1  and  b.2 
or  by  13],  Chapter  6,  Corollary  2.1,  and  Chapter  7,  Corollary  1.1.  Let 
ns(j*,^L, . . . ) be  a stationary  policy  in  (SM)  and  n=  ( pL,pL, . . . ) correspond  to  Tt  . 
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Then  for  each  xeS, 

y.x)  = y(px)  = (y-)(px)  = uyyu) 
by  Theorems  4 .2  and  H.3.  QtD 


Theorem  b . 3 

(P)  If  J:P(S) — * L0,+  oo]  and  J>TJ,  then 

It  J :3  — » [0,  + o°J  is  lower  semianalytic  and  J >TJ  , then  J>J*. 

(w)  If  J:P(S)— * \.-oo,0]  and  <KTJ,  then 

If  J:3-^>L-oo,0j  is  lower  semianalytic  and  J£TJ  , then  J<J*. 

(D)  If  J:P(S)— L-c,+c],  o<o°,  and  J=TJ,  then  J = J». 

If  J :S — » L-c ,+c] , c <oo,  is  lower  semianalytic  and  J=TJ,  then  J=J». 


Proof : 


The  proof  for  (Dm)  is  adapted  from  13],  Cnapter  b,  Proposition  9.  Under 

oo 

(P)  given  p £p(S)  and  OO,  choose  a sequence  £ >0  such  that  Y.  »K£.<£. 


k=0 


Choose  (qoIq-|,...)  such  that  Cp0 , q , q i , . . . ) t A and 

jg  dqk  + '* J[ f (qk) j (TJ)(pk)  + £k, 
where  pk=f  (qK_ -j ) , k=1,2 Then 

* i"r(p ,P0=P0  J’/i'8  ddl< 

1 lim  inf^,^  ^ [<xNJ(pN)+^fo«k  dqk] . 

By  choice  of  tne  qk's, 

N-1 


c*nJLpn]  + J^jg  d5k  i»w"1(TJ)(pN_1)  + ^«k[s  dqk  +*n'1£n_i 

i«N_1J(PN-i)  + kf^Jg  <%  +oiN"16n-i 
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N-1  i, 

i J(po)  + I 


0 k=0  K 


<.  J(pn)  + £, 


where  the  next  to  last  inequality  is  obtained  by  repeating  the  process  used  to 
obtain  the  previous  inequalities.  Therefore  J*(p0)<J(p0)+€  and  the  result 
follows . 


Under  (h),  tor  pQE.P(S), 


J,(5o>  ■ in‘(p0,v...HA  ,po=po  1ok 

- N—  1 

2 inf(p0,q0,...)£A  ,P0=P0lim  8UP"-*°°  t«Nj(pW)  + dV 

> lim  suPjg_,  oo  in^(p0,q0, . . .)tk  ,P0=P0l“hj(pN)  + ^Js  ^ ’ 
where  pN=f(qN_1),  N=1,2, how 

Inf(p .po=p0£“"^p»)  * 

= lnf(P„.<lo. 'P0=P0C Jo“ki8  dQk 

* “ 1,-1  lnfqB_, tf  (D  1 f(qB.i ) 1 ) ] 

= inf(Po,q0,...)£B 

1 “(Pp.v-)^  dqk) 

2 j(P0), 

where  the  last  inequality  is  obtained  by  repeating  the  process  used  to  obtain 
the  previous  inequalities,  and  the  result  follows. 


and  both  the  previous  arguments  can  t>e  used. 


V»e  now  establish  the  (Sm)  part  of  the  theorem  for  (P).  Cases  (N)  and  (D) 
are  shown  in  the  same  manner. 

under  (P)  define  J(p)=jjdp.  Then 

J(p)  = {jdp  2 J(TJ)dp  = (TJ)(p) 

by  Theorem  4.4.  tsy  the  result  for  (DM),  J2J*.  In  particular, 

J(x)  = J(px)  > J«(px)  = J»(x).  QED 

Tfreprsa  5,4  Let  = . . . ) and  tt  =(^,^., . , . ) pe  stationary  policies  in 

(Dm)  and  (SM)  respectively. 

(P)  If  J :P(S)  — * [0,  + oo]  and  J2T- J , then  J2J^. 

If  J:S— 4[0,+  o°J  is  universally  measurable  and  J2T^J,  then  J2J/^. 

(N)  If  J :P(S) — ) l-°o,oj  and  J1T-J,  then 

If  J:S— 4[-oo,u]  is  universally  measurable  and  then 

(D)  If  J:P(S)— a L-c,+c],  c<o°,  and  J=TpJ,  then  J=J^. 

If  J :S— =>  L -c  ,+c] , c<°°,  is  universally  measurable  and  J=T^J,  then  J = 

Proof  : 

The  proof  for  (DM)  is  a simplification  of  the  one  used  for  Theorem  5.3* 
The  proof  for  (SM)  then  follows  from  Theorem  4.3  and  Corollary  4.2.1.  WED 

Theorem  5.4  implies  that  under  (P),  is  the  smallest  nonnegative 

universally  measurable  solution  to  the  functional  equation 
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r 


Under  (D),  J^_  is  the  only  bounded  universally  measurable  solution  to  this 
equation.  This  provides  us  with  a simple  necessary  and  sufficient  condition 
for  a stationary  policy  to  be  optimal  under  (P)  and  (D) . 

Theorem  5.5  (P)(D) 

Let  ■n=(pL,p., . . . ) and  ft  . . . ) be  stationary  policies  in  (DM)  and  (SM) 

respectively.  The  policy  n is  optimal  if  and  only  if  J*=T-J*.  The  policy 
i\  is  optimal  if  and  only  if  J*=TpJ*. 

Proof: 

The  proof  for  (DM)  can  be  found  in  [4]  or  [3],  but  given  the  previous 
theorems,  it  is  quite  simple,  so  we  repeat  it  here. 

If  tt  is  optimal,  By  Theorem  5.2,  J*=TpJ*.  Conversely,  if 

J*=T-J*,  then  by  Theorem  5.4,  J*2J-  and  fr  is  optimal.  The  proof  for 

r h 

(SM)  follows  from  the  (SM)  parts  of  the  same  theorems.  QED 
Corollary  5.5.1  (P)(D) 

There  is  an  optimal  nonrandomized  stationary  policy  if  and  only  if  for  each 
xeS  the  infimum  in 


infU£r [g(x,u)  + «*jj»(x' )t(dx* !x,u)] 


is  achieved. 


Proof: 

If  the  above  infimum  is  achieved  for  every  xtf^,  then  by  Theorem  2.5 
there  is  a universally  measurable  selector  *C  whose  graph  lies  in  T and 

g ( x , u( x ) ) + o«.Jj«(x'  )t(dx'  !x,^.(x))  = infutp  (g(x,u)  + « Jj*(x'  )t(dx*  !x,u)} . 

Thenir  = (^.,j»., . . . ) is  optimal  by  Theorems  5.1  and  5.5. 


f 


1 
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If  n = (jjL,fi, . . . ) is  an  optimal  nonrandomized  stationary  policy,  then 


T^J*  = = J^  = J*  = TJ* , 
so  i*(x)  achieves  the  above  infimum  for  every  x.  QED 

Under  (N)  we  can  use  Theorem  5.3  to  obtain  a necessary  and  sufficient 
condition  for  a stationary  policy  to  be  optimal.  This  condition  is  not  as 
useful  as  that  of  Theorem  5.5,  however,  since  it  cannot  be  used  to  construct  a 
stationary  optimal  policy  in  the  manner  of  Corollary  5.5.1. 


Theorem  5.6  (N)(D) 

Let  jt  = ( fA,p>  • • • ) and  Tr  = (j*,j*,...)  be  stationary  policies  in  (DM)  and  (SM) 
respectively.  The  policy  ft  is  optimal  if  and  only  if  J-=TJ^.  The  policy 
JX  is  optimal  if  and  only  if  J^=TJp_. 

Proof: 

Again  the  proof  for  (DM)  can  be  found  in  [4]  or  [3]  but  is  given  here. 

If  ft  is  optimal,  Jp=J*.  By  Theorem  5.1, 

jjl  = J*  = TJ*  = TJ- . 

Conversely,  if  J-=TJ-,  then  by  Theorem  5-3.  J^<J*  and  jf  is  optimal. 

If  n is  optimal,  = TJ^  by  the  (SM)  part  of  Theorem  5.1.  The  converse 
is  more  difficult,  since  the  (SM)  part  of  Theorem  5.3  cannot  be  invoked 
without  knowing  that  J ^ is  lower  semianalytic . Let  ff=(jx,yl, . . . ) correspond 
to  Tt  =(jx,|i, . . . ) , so  that  J-(p)=jj^dp  for  every  ptP(S).  Then  for  fixed  pcP(S) 
and  qeP(Dp, 

(g  dq  +«J-(f(q)]  = ( [g(x,u)  + <*  ( Ju(  x ' ) t ( dx ' | x,u ) ] q(  d(x,u ) ) 

J /*•  JSC  JS  • 


2 j intu[p  Lg(x,u)  + <*  [ J^(x  ' )t(dx' !x,ujjp(dxj 

b>  X b ' 

provided  t tie  integrand  is  universally  measurable 
= j ('i'Ji^)dp 


j J^dp,  so  the  integrand  is  universally  measurable, 


= J^P)- 


Infimizing  the  left  hand  side  over  qeP(D  , we  see  that 


TJ-  > J-  = T-J-. 
K /*  f-  f 


The  reverse  inequality  always  holds,  and  by  the  proof  already  given  for  (DM), 
iris  optimal.  Then  n is  optimal  by  Corollary  4.o.1.  QED 


££  iiia  dynamic 


J.  T he  dynamic 


(DM)  and  (Sh)  by 


Jo  = 


J0  = 0. 


is  defined  recursively  for 


Jk+1  = TJk’  k=0> 1 > • • • > 


Jk+1  = TJk’  K=u’ 1 > • • • • 


V«e  saw  in  Chapter  3 that  this  algorithm  generates  tne  k-stage  optimal 
cost  function  j£.  For  simplicity  of  notation,  we  suppress  the  * here.  At 
present  we  are  concerned  with  the  infinite  horizon  cases  and  the  possibility 


that  J.  may  converge  to  J * as  k-^oc  . 


W 


1 
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Under  (P),  J^J-j  and  so  J1=TJ(><TJ1=J2.  Continuing,  we  see  that 
Jk  is  an  increasing  sequence  of  functions,  and  so  = lim1<JK  exists  and 

takes  values  in  [0,+oo].  Under  (N),  Jk  is  a decreasing  sequence  of  functions 
and  exists,  taking  values  in  [-00, 0].  Under  (D) , if  -c<J<c<oo, 

0 < J + c 

< b + T(J  + c) 

: b + «c  + TJ 

< b + T(b  + «c  + TJ) 

< b + otb  + a2c  + T2J, 

and  in  general, 

k-1  - - - - 

0 < limk_J>oo  [b + wi<c  + Tkj]  = b/(1-«)  + limk ^ TkJ . 

Similarly , 

0 > -b/(1-e>0  + limk_^  ^ TkJ. 

Therefore,  limkTkJ  exists  and  takes  values  in  [ — b/ ( 1-«) , b/( l-o) ] . In 

particular,  exists  and  takes  values  in  [-b/(  1-a) , b/(  1-ct)] . 

The  same  arguments  can  be  used  to  establish  the  existence  of  =limkJk. 
Under  (P),  :S— »[0,  + oo];  under  (N),  J«,  :S  -4  [- 00, 0] ; and  under  (D), 

limkTkJ  :S  — > [ — b/ ( 1-a) , b/  ( 1-<x)  ] , where  J:S— »[-c,c],  c<oo  , is  lower 

semianalytic.  Note  that  in  every  case,  the  lower  level  sets 

OO  oO  <yO 

{Joo  < c } = n U n (Jw  < C + 1/n) 

n= 1 K= 1 k=K  K 

are  analytic  by  Theorem  A. 2,  so  is  lower  semianalytic. 


L 
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Lemma  iLJ  (P) (N) (D) 

For  every  ptP(S),  Jk( p)  = jjk( x)p( dx) , k=0,1,...  and  k=°o. 

Proof : 

Applying  Theorem  4.4,  use  induction  to  prove  the  lemma  for  k=0,1,.... 
when  k=°°,  the  lemma  follows  from  the  monotone  convergence  theorem  under  (P) 
or  (N)  and  the  bounded  convergence  theorem  under  (D) . QED 

Theorem  5^7  (N)(D) 

we  have 

Joo  = Joe  = J*. 

Indeed,  under  (D)  the  dynamic  programming  algorithm  can  be  initiated  from  any 
J :P(S)  — > L-c , c] , c<o°,  or  lower  semianalytic  J:S— >[-c,c],  c<oo,  and 

converges  uniformly,  i.e. 

limk->oo  3uPptP(s)l(Tl<j)(P)  " J*(P}'  = °- 
lirnk_)oo  suPxtS'  (Tkj)  (x)  ~ J*(x)|  = 0. 

Proof: 

The  theorem  for  (DM)  follows  from  [4],  Proposition  1 and  Lemma  1 or  [3], 
Chapter  6,  Proposition  3,  and  Chapter  7,  Proposition  4.  By  Lemma  5.1, 
Jk(x)  = Jk(px) , k=0,1,...  and  k = oo,  so  the  theorem  holds  for  (SM)  under  (N)  as 
well.  Under  (D) , define  J(p)  = Jjdp  and  use  Theorem  4.4  and  the  (DM)  result. 
QED 


The  case  (D)  is  the  best  suited  for  computational  procedures.  The 
machinery  developed  thus  far  can  be  applied  to  the  proof  of  [3],  Chapter  6, 
Proposition  4,  to  show  the  validity  for  (SM)  of  the  error  bounds  given  there. 
We  state  the  theorem  for  (SM).  The  analogous  result  is  true  for  (DM). 
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Theorem  5.8  (D) 

Let  J:S— >[-c,c],  c<+oo,  be  lower  semianalytlc . Then  for  all  xtS  and 

k=0,1 

(TkJ)(x)+ck  < (Tk+1J)(x)+ck+1  < J*(x)  < (Tk+1J)(x)+ck+1  < (TkJ)(x)+ck, 
where , 

ck  = [°/(1-«)]infXfS[(TkJ)(x)  - (Tk_1J)(x)], 
ck  = [«/( 1-a)  ]supx{;S[  (TkJ)  (x)  - ( Tk_1  J ) ( x) ] . 

Without  further  assumptions  we  have  only  the  following  weaker  results 
concerning  convergence  of  the  dynamic  programming  algorithm  under  (P). 

It  holds  that 

3oo  < J\  < J#. 

Furthermore,  the  following  statements  are  equivalent: 


(a) 

= TJqo , 

(b) 

= J*, 

(c) 

= TJ  co  f 

(8) 

= J*. 

Proof: 

By  [4],  Proposition  10,  it  holds  that  Joo<J*.  By  the  same  reference  or 
by  [3],  Chapter  7,  Proposition  4,  (a)  and  (b)  are  equivalent.  By  Lemma  5.1 


and  Theorem 

4.6,  it 

holds 

that 

Joo (x)<J*(x) . 

Conditions 

(a) 

and 

(c)  are 

equivalent 

by  Lemma 

5.1 

and 

Theorem  4.4. 

Conditions 

(b) 

and 

(d)  are 

equivalent  by  Lemma  5.1  and  Theorem  4.6.  QED 


A 
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Theorem  5.10  ( P ) ( D ) 

Assume  that  the  sets 

Uk(x,c)  = {utl^:  g(x,u)+«jjk(x'  )t(dx'  ! x,u)<c} 

are  compact  subsets  of  C for  every  xtS,  ccR,  and  for  all  k greater  than  some 
integer  k.  Then  the  equivalent  conditions  of  Theorem  5-9  hold  and  there 
exists  a nonrandomized  stationary  optimal  policy. 

Proof : 

Use  the  proof  of  Proposition  13  of  Chapter  6 of  [3]  and  Corollary  5.5.1. 

QED 

The  next  two  corollaries  give  conditions  under  which  the  assumptions  of 
the  theorem  are  satisfied.  Note  that  under  (D)  the  only  new  result  in  the 
theorem  and  corollaries  is  the  existence  of  a stationary  optimal  policy.  The 
equivalent  conditions  of  Theorem  5.9  always  hold  under  (D) . 

Corollary  5.10.1  (P)(D) 

Assume  the  set  is  finite  for  each  x£S.  Then  the  equivalent  conditions  of 
Theorem  5.9  nold  and  there  exists  a nonrandomized  stationary  optimal  policy. 

Corollary  5.10.2  (P)(D) 

Let  (3.16)  - (3-19)  hold.  Then  the  equivalent  conditions  of  Theorem  5.9  hold 
and  there  exists  a Borel  measurable  nonrandomized  stationary  optimal  policy. 

Proof: 
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= + oa  otherwise; 

are  lower  semicontinuous.  For  c£fi  and  n fix^d,  condition  (3.17)  implies  the 
existence  of  j for  which 

Ux,u)eSC:  Hn(x,u)ic}c  P^. 

This  lower  level  set  is  closed  by  the  lower  semicontinuity  of  «n  and  this 
implies  the  compactness  of  Un(x,c)  for  each  xtS.  Theorem  5.10  can  now  De 
invoked  and  it  remains  only  to  prove  that  the  nonrandomized  stationary  optimal 
policy  whose  existence  is  guaranteed  by  that  theorem  can  be  chosen  to  be  Borel 
measurable.  This  will  follow  from  Theorems  3.1*  and  5.5  if  Jo©  = J*  can  be  shown 
to  be  lower  semicontinuous.  Under  (P) 

J«  = suPnJn» 

while  under  (D) 

Joo  = sup  (J  - b I «k}. 

11  “ k=n+1 

In  either  case,  is  lower  semicontinuous  by  Lemma  2.5(b).  OLD 
Theorem  5.11  (P)(D) 

Suppose  either  that  the  set  Tx  is  finite  for  each  xtS,  or  else  (3-16)  - (3.19) 
hold.  Let  — *C  be  a sequence  of  universally  measurable  functions  such 

that  fk(x)trx  for  every  x,  and  T^Jk=TJk,  k=°»1 Assume  that 

{x: J*(x)<°°}  can  be  partitioned  into  countably  many  disjoint  universally 
measurable  sets  such  that  on  each  Bj  a subsequence  of  {/*KJ 

converges  to  a function  . Then  /*■'  can  be  extended  to  a universally 
measurable  function  ^-on  S such  that  ^(x)tPx  for  every  x,  and  n=(^*,^,...)  is 


optimal. 


Proof : 


Clearly  is  universally  measurable.  Suppose  first  that  ( 3 - 1 o ) - (3.1iO 
hold.  FY-om  the  proof  of  Theorem  3. 4 we  see  that  the  functions 

Hn(x,u)  = g(x,u)  +a|jn(x')t(dx'|x,u)  if  (x,u)tP; 

= + oo  otherwise; 


are  lower  semicontinuous,  n=0,1,....  Choose  xtBj  and  let  foj  be  an  infinite 
subset  of  the  positive  integers  such  that  converges  to  /*'  on  Cj. 

J 

Then  for  n fixed, 


(b.1)  °°  > JMx)  = limk£NjJk+1(x) 


= limktfl  [ g(x ,y*k(x ) ) + ctjjk(x  ' )t  (ax'  |x ,/-k(x) ) ] 

2 lim  supktN^[g(x,/Ak(x) ) + “jjn(x  ' )t(dx'  Ix./^U)  )J 

= lim  supkeN  Hn(x,^k(x)) 

J 

>.  hn(x,/>-*  (x) ) by  the  lower  semicontinuity  of  hn 


= (yJn>(x> 


2 (TJn)(x) 


This  implies  (x,^'(x))cr  . Use  von  Neumann's  Lemma  (Theorem  A.b)  to  extend 
to  a universally  measurable  function  ^ from  S to  C satisfying  (x,/a(x))«P  for 
every  x.  Now  let  n — * oo  in  (5.1)  and  from  Lemma  3.4(d)  or  (c)  and  Theorem 
5.10,  conclude  TJ*=T^J".  Apply  Theorem  5.5. 
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If  instead  of  (3-16)  - (3.19),  we  have  that  Px  is  finite  for  each  xtS, 
the  argument  is  still  valid  if  we  use  the  finiteness  of  Px  to  show 


Urn  supktNjHn(xftik(x)) 


Hn  ( x , (*.'  ( x ) ) 


in  establishing  (5.1).  QED 


Sgstign  iL_  Existence  optimal  and  fc-optimal  policies 
Theorem  5.^,1 2 (P)(D) 

For  each  t>0,  there  exists  an  t -optimal  nonrandomized  Markov  policy  for  (SM), 
and  if  ot<1,  it  can  be  taken  to  be  stationary.  If  for  each  x£S  there  exists  a 
policy  optimal  at  x for  (SM),  then  an  optimal  nonrandomized  stationary  policy 
exists. 


Proof: 


oo 


Choose  € >0  and  €k>0  such  that  Z^<**<€k=€.  If<*<1,  let€k=(1-»)  for  every 

k.  By  Theorem  2.5,  there  are  universally  measurable  functions  ^k:S— >C, 
k=0,1,...,  such  that  /ik(x)£.f^  for  every  x and 


J*  +ek- 

Ifol<1,  we  can  choose  all  the  j*k's  identical.  Then 

(Tu  V )(J*)  < (Tp.  J«)  +«ek  < J»  +ek_,  + *€, 

^k-1  Kk  Hc-1  * * 1 1 

Continuing  this  process,  we  have 

(T„  V...V  )(J0)  < )(J»)  < J»  + < J»  ♦€, 


>o>1*  he 


rQr  i 


j=0 


Jn  i J*  + e, 


and  letting  k— )oo,  we  obtain 
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f * 


where  n = (^Q »/*i  This  proves  the  first  part  of  the  theorem. 

Suppose  u\,  - . . T1  ' is  a policy  for  (SM)  which  is  optimal  at 

xtS.  By  Theorem  3 - 1 , there  is  a ’r  = (/*-0 ,/^|  > • • • ) t-TT  such  that  J7l(x)  = Jni(x)  = 
J*(x)  and 

j Ao(£!x0)px(dxu)  = {.,K.'-ixo)Px(dxo) 

for  every  £t@s,  G.t<2>c. 

Therefore 

(T  J»)(x)  = ( T , J* ) ( x ) . 
r o “o 

how 


! 


J*(x)  = limw(T  T ...T  )(Jfc)(x) 

“ ^o  ^1  “k 

= T„  [lim„(T„  ...V  )(J*)](x>  by  Lemma  3. Mb)  or  (c) 

^o  k ^1  • k 

2 (T  J*)(x) 

~ o 

2 (TJ*)(x) 

= J*(x). 

Consequently,  we  have  (TM, J*)(x)=(TJ*)(x).  This  implies  that  for  each  x,  the 

r o 

infimum  in  the  expression 

inf  n tg(x,u)  +«  |j«(x')t(dx'ix,u)j 
uti  x 

is  achieved.  The  conclusion  follows  from  Corollary  5.5.1.  QED 


Theorem  5.13  (N) 

For  eachOO,  there  exists  an  6-optimal  nonrandomized  semi-Markov  policy.  If 
for  each  xtS,  there  exists  a policy  optimal  at  x for  (SM),  then  there  exists  a 
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semi -Markov  (randomized)  optimal  policy. 

Proof: 

Under  (N),  J»  (Theorem  5.7),  and  so  given  £>0,  the  analytically 

measurable  sets 


Ak  = lx:  Jk(x)<J*(x)+(€/2)  if  J*(x)>-oo,  Jk(x)l-(1/€)-(£/2)  if  J-(x)s-oo) 

converge  up  to  S.  Use  Corollary  3.2.2  to  define  for  each  positive  integer  k a 
nonrandomized  semi -Markov  policy  n k=(^Q, ...  ,/Ak-1  ,/*,/*,... ) such  that 

J ^(x)  l Jk(x)  + £/2  if  Jk(x)>-oo; 
i -1/€  otherwise; 

for  every  xtS.  Then  for  xtAk, 

J i_(x ) ^ J*(x)  +£  if  J*(x)>-oo; 

Tt  K 

i -1/6  otherwise 

The  policy  Tt  defined  to  be  when  the  initial  state  is  in  Ak,  but  not  in  Aj 
for  any  j<k,  is  e-optimal,  semi-Marxov  and  nonrandomized. 

Suppose  now  that  for  each  xtS,  there  exists  a policy  optimal  at  x for 
(SM).  Let  G:P(S)P(SC) ...—»[- oo,0j  be  as  defined  in  Theorem  4.5.  Then  for 
each  px,  xtS,  there  is  an  admissible  sequence  (px,qQ,q^[, . . . ) £ A which 
achieves  the  inf imum  in 


inf(p0,q0,...)t A ,Pn=PxG(po,qo» 


) 


(Theorem  4.1  and  Corollary  4.2.1). 
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The  set  {px:  xtS}P(SC)P(SC) . . . is  analytic,  so  G is  lower  semianalytic 

on  this  set.  There  is  a universally  measurable  selector  q>:S  -=>P(bC)P(SC) . . . 
such  that  (px,q>(x ) ) t A and 

G ( px , (J)(x ) ) = J-(x) 
for  every  xtS  (Theorems  2 .b  and  B.7).  Denote 

(p(x)  = (qQ( ' ixj.q^"  |x),  ...). 

By  Theorem  C .2 , for  each  x and  k,  q^C'ix)  has  a decomposition  into  its 
marginal  pk('lx)  on  and  a universally  measurable  stochastic  kernel 

/*k^uk ' x ’ xk^  on  *^k  Siven  SqS^.  The  policy  tt  ,... ) is  semi-Marxov  and 

optimal.  C£D 

Although  randomized  policies  are  intuitively  infericr  and  avoided  in 
practice,  under  (N)  as  posed  here,  they  cannot  be  disregarded,  as  the 
following  example  demonstrates. 

Example  5. 1 (Petersburg  Paradox) 

Let  S= {0 , 1 ,2 , . . . } , C=  10, 1 } , W={w}, 

g(x,u)  = -2X  if  x*0,  u=0;  f(x,u,w)  = x+1  if  u=1,  x*0; 

= 0 otherwise;  = 0 otherwise. 

Beginning  in  state  1 , any  nonrandomized  policy  increases  the  state  by  one  for 

finitely  many,  say  k,  moves  at  no  cost  and  then  jumps  to  zero  at  a cost  of 
k+1 

-2  , where  the  state  remains  at  no  further  cost.  Thus  J"(1)=-oo,  but  this 

value  is  not  achieved  by  any  nonrandomized  policy.  On  the  other  hand,  the 
randomized  policy  which  advances  with  probability  1/2  when  the  state  x is 
nonzero  yields  an  expected  cost  of  -oo. 


L 
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The  one-stage  oost  g in  Example  5.1  is  unbounded,  but  by  a slight 
modification,  an  example  can  be  constructed  in  which  g is  bounded  and  the  only 
optimal  policies  are  randomized.  If  one  stipulates  that  J*  must  be  finite,  it 
may  be  possible  to  restrict  attention  to  nonrandomized  policies  in  Theorem 
5-13.  This  is  an  unsolved  problem. 
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CHAPTER  6 


GENERALIZATIONS  OF  RESULTS 


.L.  lbs 


model 


The  model  of  Definition  3.1  is  said  to  be 


i.e.  the  state, 


control  and  disturbance  spaces,  constraint  set,  cost  and  system  function,  and 
disturbance  kernel  are  independent  of  the  stage.  A nonstationary  model  can  be 
reduced  to  a stationary  one  by  the  technique  described  in  Chapter  6,  Section 
7,  of  Bertsekas  [3]  or  Section  8 of  Schael  [32].  If  the  state  spaces 
S0,...,Sjj_^  and  the  control  spaces  C0,...,CN_1  are  nonempty  Borel  spaces,  the 
constraint  sets  Pk  are  analytic  subsets  of  SkCk  satisfying  (Tk)x  *0  for 
every  xkcSk,  k=0,...,N-1,  the  cost  functions  8k:^kCk~>R*  are  lower 
semianalytic , k=0,...,N-1,  and  the  state  transition  kernels  t^^dxk+1 ' xk’uk^ 
are  Borel  measurable,  k=0,...,N-2,  then  we  define  new  state  and  control  spaces 

S = {(x,k):  x£Sk), 

C = { (u, k) : utCk] , 

with  the  metric  on  S 

d[(x,i) , (y , J ) ] = di(x,y)  if  i=j; 

= diameterCS^)  + diameter(Sj)  if  i^j; 

where  d^^  is  a metric  on  S^  consistent  with  its  topology.  If  N<oo,  we  also 
include  in  S an  isolated  point  T.  A similar  metric  is  defined  on  C. 


Define  the  constraint  set  T = { ( (x,k) , (u,k) ) : 0<k<N-1,  (x.u)^^},  the  cost 


function 


g((x,i),(u,  j))  = g-^x.u)  if  i=j; 

= 0 if  i*j; 


and  the  state  transition  probability 


t(£l  (x,i),(u,j))  = t^tx':  (x',i+1)t£}|x,u)  if  i=j/h-1; 

= p(£)  otherwise; 

where  p=pT  if  N<°°  and  is  any  fixed  probability  measure  on  S if  h=°°. 

Defined  in  this  manner,  S,C,P,g  and  t satisfy  the  assumptions  of  the 
stationary  stocnastic  model  (SM).  Conditions  on  the  gk's  such  as 
nonnegativity  or  uniform  boundedness  in  k result  in  the  same  conditions  on  g. 
Identifying  Sk  with  t(x,k):  xtS^}  and  l C u , k ) : utC^},  there  is  a 

clear  correspondence  between  the  stationary  and  nonstationary  models.  The 
system  in  the  stationary  model  moves  from  the  set  Ux,k):  xtS^.}  to  the  set 
l(x,k+1 ) : xf.Sk+1}  with  probability  one  at  each  stage.  If  N<°° , the  system 
moves  from  {(x,N— 1 ) : xtS^_1}  to  X and  remains  there  at  no  further  cost. 

Universally  measurable  policies  in  the  two  models  correspond  and  result  in  the 
same  expected  cost. 

Because  such  a reduction  is  possible,  results  already  proved  for  the 
stationary  model  with  either  a finite  or  an  infinite  horizon  have  immediate 
counterparts  fbr  the  nonstationary  model.  An  illustration  of  this  is  the 
nonstationary  optimality  equation  (Theorem  5.1). 

Theorem  6 , 1 (P)(N)(D) 

Let  Sq,Si,...;  Cq,C>|,...;  as 

described  above  and  suppose  either 


(P)  0 1 gk(x,u)  for  every  (x,u)tSkCk,  k=0,1 


9 • • • 9 
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(N)  0 > gk(x,u)  for  every  (x,u)tSkCk,  k:0,1 or 

(D)  -b  < gk(x,u)  < b < oo  for  every  (x,u)e 3kCk,  k=0 , 1 , . . . , and  0 < « < 1 . 

Let  J*(x,k)  be  the  optimal  cost  associated  with  state  x in  space  Sk>  Then  for 
each  k,  J*(x,k)  is  lower  semianalytic  on  Sk  and  satisfies 

J*(x,k)  = infue(r  ) [g(x,u)  + <*{  J*(x' ,k+1  )tk(dx' ix,u)] , k=0,1, ] 

k x Sk+1 

We  will  henceforth  apply  nonstationary  results  and  reference  only  their 
stationary  counterparts.  We  get  an  easy  theorem  for  the  stationary  (SM)  in 
the  following  manner. 

Definition  6 . 1 Given  ptP(S)  and  £>0,  a policy  tt  in  (SM)  is  weakly 

D-c-optimal  provided 

|jN  K(x)p(dx)  < jj*(x)p(dx)  + € if  Jj^(x)p(dx)>-oo; 

< -1/e  if  jj^(x)p(dx)=-e». 

A policy  IT  in  (SM)  is  p-optimal  provided  p{x:  JN>n(x)rJN(x)}=1 . 

Theorem  6.2  ( F+) (F-) (P) (N ) (D) 

Given  OO,  there  is  a set  of  nonrandomized  policies 

ir(  P ) = ( f-0  ( duQ ! p ; xQ ) , . . . , 1 ( duN_  i ! p ; xN_  •, ) ) 

such  that  for  k=0, 1 , . . . ,N-1 , y*  k is  universally  measurable  in  ( p ; xk ) andir(p) 
is  weakly  p-e-optimal  for  every  ptP(S). 


Proof: 

For  (F+),  (P)  and  (D),  stronger  results  have  already  been  proved 
(Corollary  3.2.2  and  Theorem  5.12).  Consider  now  the  nonstationary  problem 
where  SQ=P(S),  C0={u0},  no=S0C0,  go(p,uQ)=0  for  every  (p,u0),  and 
tQ( * ip,uQ)=p.  The  subsequent  state  spaces,  control  spaces,  constraint  sets, 
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cost  functions,  and  transition  probabilities  are  S,C,r,g,  and  t of  the 

t 

stationary  model  respectively.  If  is  the  optimal  cost  function  in  the 
stationary  model,  then  for  p£SQ,  the  optimal  cost  at  p in  the  nonstationary 
model  is  jj^(x)p(dx)  (Corollary  3.2.4  or  Corollary  4.6.2). 

There  exists  an  6 -optimal  nonrandomized  semi-Markov  policy  in  the 
nonstationary  model  (Corollary  3.2.2  or  Theorem  5.13)  and,  omitting  the  first 
function  in  this  policy  (which  is  identically  uQ),  we  obtain  a set  of  weakly 
p-€-optimal  policies  in  the  stationary  model.  QED 

Section  2 L.  Xk£  imperfect  state  information  model 

Definition  An  imperfect  state  information  stochastic  decision  model 

(ISI)  is  the  ten-tuple  (S,C, (Tq, . . . ,PN_1 ) ,Z,«,g,t,s0,s,N)  described  below. 
S,C,«,g,t:  State  space,  control  space,  discount  factor,  one-stage  cost 

function . and  state  transition  kernel  as  defined  in  Definition  3.1,  (3.2) 
and  (3.1). 

Z:  Observation  space . A nonempty  Borel  space. 

Pk:  k-th  constraint  set.  An  analytic  subset  of  IkC,  where  Ik=ZC...CZ,  the  Z 

appearing  k+1  times  and  the  C appearing  k times.  An  element  of  Ik  is 
called  a k-th  information  vector.  The  constraint  sets  Pk  satisfy 
(rk)lk*0  for  every  iktlk. 

sQ:  Initial  observation  kernt.1 . A Borel  measurable  stochastic  kernel  on  Z 

given  S. 

s:  Observation  kernel.  A Borel  measurable  stochastic  kernel  on  Z given  CS. 


N: 


Horizon . A positive  integer  or  +°o. 
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The  system  moves  stochastically  from  state  xk  to  state  xk+1  via  the  state 
transition  kernel  t(dxK+i !xk,uk)  and  generates  cost  at  each  stage  of  g(xK,uKJ. 
The  observation  is  stochastically  generated  via  the  observation  kernel 

sidzk+i  |uk,  xk+1 ) and  added  to  the  past  observations  and  controls 
(z0,u0, • • • »zk>UK)  t0  form  the  (k+1 )-st  information  vector 
ik+1  **' zo,uo’  • * * ’ zk’  uk’  zk+1  ^ ' The  first  information  vector  i0=lzQ)  is 
generated  by  the  initial  observation  kernel  s0(dz0!x0),  and  the  initial  state 
xQ  has  some  given  initial  distribution  p.  The  goal  is  to  choose  uk  dependent 
on  the  K-th  information  vector  i..  so  as  to  minimize 

h. 

N-1 

E{  I ^kg(xk,uk)} . 
k=0  * K 

In  what  follows,  our  notation  will  generally  indicate  a finite  h.  If  N 
is  infinite,  the  appropriate  interpretation  is  required. 

Definition  b.2  A policy  in  (1SI)  is  a sequence  1T=(^.0,  ...  ) such  that  for 

each  k,  y>k(duk !p; ik)  is  a universally  measurable  stochastic  Kernel  on  C given 
?(S)Ik  satisfying 

for  every  (p;ik).  If  for  each  p,  k and  ik,  /*-k( ' Ip; ik)  assigns  mass  one  to 
some  point  in  C,  n is  nonrandomized . A policy  is  said  to  be  Borel  measurable 
if  all  its  component  stochastic  kernels  are. 

The  concepts  of  Markov  and  semi-Markov  policies  are  of  no  use  in  (ISI), 
since  the  initial  distribution,  past  observations  and  past  controls  are  of 
genuine  value  in  estimating  the  current  state.  Thus  we  expect  policies  to 
depend  on  the  initial  distribution  p and  the  total  information  vector.  In 
this  chapter,  7T  will  denote  the  set  of  all  policies  in  (1S1). 
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Just 

as  we  denote 

the 

set  of  all 

sequences  of  tne 

form 

0»  • • • 

» -j  > )f-Zo.  . . CZ 

by 

lk  and  call 

these  sequences  the 

k-th 

information  vectors,  we  find  it  notationally  convenient  to  denote  tne  set  of 
all  sequences  of  the  form  (xQ, z0,uQ, . . . ,xk, zk, uk)£3ZC. . .SZC  Dy  hk  and  call 
tnese  sequences  the  k-tn  history  vectors . Except  for  uk,  the  k-th  information 
vector  is  that  portion  of  the  k-th  history  vector  known  to  the  controller. 


Given  pep(S)  and  ir  =(^lq,  ...  )cTT  , there  is  a sequence  of  consistent 
probability  measures  Pk(TT,p)  generated  on  hk,  k=0 , . . . ,N-1 , defined  on 
measurable  rectangles  by 


(b.1)  pk(TT,p)  {^^Cq.  . = f j \ •••  j"  \ /\(£.k|P>zo,uo»  " ' ,uk-1  ,zk 

^o  *o  ^k  Ak 

s(dzk|uk-1’xk)t(dxk|uk-1’xk-1)- ••/Ao(duolp;zo)so(dzo|xo)p(dxo)- 


1 


Definition  b_Jl 

Given  ptP(S),  a policy  tx  =(/*-0, ...  »fN_1  . and  a positive  integer  nXN, 

the  K-stage  cost  function  corresponding  2 at  p is 

(6.2)  JK,n(p)  = ^kfg^xk'uk)dPk(n’p)- 

The  cost  function  corresponding  ia  S.  is  Jh|Tt-  If  N=oo,  we  impose  either 
condition  (P),  (N)  or  (D)  of  Chapter  4,  Section  1,  on  the  model  to  ensure  that 
the  sum  in  (6.2)  is  a well-defined  extended  real  numoer . If  N<°°,  we  will 
assume  either  (F~) : 

|g+(x<<,uk)dPk(fr,p)  < oo  , k=0 , ...  ,1m-  1 , 
for  every  trtTT  and  ptP(S)  or  (F+): 


|^g-(xk,uk)dPk(n,p)  <°°,  k=0, . . . ,im-1  , 


luu 


for  every  nxTT  ana 


'me  optimal  cost  1 unction  at  p is 


VP}  = inf-ntTl  Jn,^p) 


'lhe  concepts  of  optimality  at  £.1  optimality . £ -optimality  jai  £ ana 

e -optimality  of  policies  are  tne  same  as  tnose  given  in  definition  p.o. 


To  aia  in  tne  analysis  of  fioi;  we  introauce  tne  iaea  of  a statistic 
sufficient  tor  control,  wmcn  allows  us  to  revert  to  a nonstacionary  perl  tot 
state  information  mouel. 

net  mition  u.4  h statistic  is  a sequence  L=itr0 , . . . ) of  corei  measuraoie 

functions  o_(<:t'fojik — ) wnere  iK  is  a Lorel  space,  A=u,...,w-1.  A statistic 

I = fcr0,  . . • ,crk_1 ; is  sufficient  tor  control  proviaec: 

(a;  If  for  some  a,  p,p'tr(.oj,  and  iK,i^iik,  we  have  crK(,p;  l^^cr^^p' ;i^; , 
tnen  (^^=0^ , ; 

fbt  'mere  exist  torel  measuraole  stocnastic  Kernels  cloy  1 ' y «. » u«.  ^ 011  Xk+'i 

given  1,0  sucn  tnat  tor  every  ptr-fo) , TttTT  t 1 U®  > , 

K K x+1 

fo.4)  r'K+1(tt,pUa-K+1(p;-;£iic+1!£rk(p;m)=yk,uK=Ut  = V-^+1  i yK,  u; 1 

(n,  p^-aliuost  surely, z k=o  , . . . ,n-a ; 


we  use  tue  notation  er^+i  (p;  ' J C t0  indicate  tne  set 

UX0>Z0’U0’  ' • • ,XK+1  ,ZK+1  ,UK+1  )Cnk+1  i<rK+'l  '■P’  U0,u0’  • • • ,Uk’  ZK+1  J£-^K+1  1 ’ 
wnenever  a sunset  depending  on  some  of  tne  components  01  a cartesian  proauct 

is  ccnsiaerea,  tnis  type  of  notation  will  De  empioyeu. 


‘in  this  context  "r'k+1  vtt,^) -almost  surely"  means  tnat  tne  set 

UXo,Zo^o>-"'XK+1>ZK+1>UK+1^nK+lU°-‘4)  n0lC1S  V,ne“ 
nas  pJ  measure  one. 


/ 
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(c)  There  are  lower  semianalytic  functions  §k:fkC — >H*  satisfying  for  every 
ptP(S)  and  -ntTT , 

(o.5)  tlg(xk,uk)  l<rk(p; ')  = yk,uK=u}  =gk(yk>u) 

rk(ir,p)-almost  surely,  k=0,...,N-1.  The  expectation  is  with  respect  to 

^(Tr.p). 

It  is  no  additional  restriction  to  assume  that  under  (f),  §k20> 

k=0 N-1 ; under  (N) , gk<0,  k=0, . . . ,AI-1 ; and  under  (u),  -b<gk<b, 

k=o,...,N-1.  We  make  this  assumption. 

Condition  (a)  of  Definition  6.4  guarantees  that  the  constraint  set 
section  (r^)^  can  be  recovered  from trk(Pi ik) • Define  for  k=U,...,w-1, 

(b.b)  Pk  = {(yk,u):  if  yk=<rk(P;  i-k) » then  Uk.u)£,V- 

Then  oy  (a),  for  any  p,  (Tk)y  for  every  yk  and 

(o.7)  Tk  = t(ik,u):  (o-k(p;ik),u)£  rk). 

If  rk=lkC,  condition  (a)  is  satisfied  and  Pk=*kc-  This  is  the  case  of  no 
control  constraint. 

Condition  (b)  guarantees  that  the  distribution  of  «"k+1  depends  only  on 
the  values  of  a*k  and  uk.  This  is  necessary  in  order  that  the  yk's  form  the 
states  of  a stochastic  decision  model. 

Condition  (c)  guarantees  that  the  cost  corresponding  to  a policy  can  be 
computed  from  the  distributions  induced  on  the  (yk»uk)  pairs. 

Definition  6.5  Let  (S,C, (T0, . . . ,rN-1 ) , Z,« ,g,t, s0,s,Pl)  be  an  imperfect  state 
information  stochastic  decision  model  (iSI)  as  described  by  Definition  0.1. 


J 
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‘ Let  E = (cr0, . . . ,<rN_i ) be  a statistic  sufficient  for  control  with  range  spaces 

lk,  k=0,...,N-1.  The  perfect  information  stochastic  decision  maflel 

(rSI)  corresponding  to  (IS1)  consists  of  the  following: 
fo> • • • : State  spaces. 

C:  Control  space. 

Tq, • • . ,fN_i : Constraint  spaces  defined  by  (b.6). 
a:  Discount  factor. 

g:  One  stage  cost  function  defined  by  (6.5). 

tQ, . . . ,tN_^ : State  transition  kernels  defined  by  (6.4). 

N:  tiorizon. 

Thus  defined,  (PSI)  is  a nonstationary  version  of  (SM)  as  considered  in 
Chapters  3-6.  The  definitions  of  policies  and  cost  functions  in  (PSI)  are 
analogous  to  those  of  Chapter  3>  Section  1,  for  (3m).  We  will  use  tne 

I circumflex  (")  to  denote  these  objects  in  (rSl).  for  example,  TT’  is  the  set 

of  all  policies  and  Tf  is  the  set  of  Markov  policies  in  (PSI). 


If  fh(  £o ) is  a Policy  in  (PSI),  then  tne  sequence 

(/io1‘duo'cro^p;io^  -I'**'  >pN-1  ^ duN-1  i*b<P;i0>.u0.-- ’ ,uN-2  ,<rh-1  ^p»  iN— 1 ^ j ^ 


is  a policy  in  (IS1).  We  call  this  policy  -Jr  also  and  can  regard  TT  as  a subset 
of  TT  in  this  sense.  If  fr  is  nonrandomized  in  (PSI),  then  it  is  also 
nonrandomized  in  (IS1).  We  will  see  in  Theorem  b.3  that  fi  in  (PSI)  and  tt  in 
(ISI)  result  in  the  same  cost. 


Given  a policy  n=(u0, . . . >/^_i  )t  TT  and  q£p(YQ),  there  is  a sequence  of 
consistent  probability  measures  Pk(it,q)  generated  on  Y C...YkC,  k=0,...,N-1, 
defined  on  measurable  rectangles  by 


J 
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(6.8)  pk<a,q){i0c0...Xieclc>  = f [ ...[  pk(£k!y0.% «k-i'yk> 

Ao  xk 

tk-i(dyk|yk-Tuk-i)**ro(duo|yo)<«(dyo)- 

Define  : P(S)  P(YQ)  by 

(6.9)  q)(p)(I0)  = {s0(<r0(p;-)cl0!x0)p(dx0). 

Thus  defined,  (p ( p ) is  the  distribution  of  the  initial  state  y0  in  (PSI)  when 
the  initial  state  x0  in  (ISI)  has  distribution  p.  The  mapping 

(x0,p)-^s0(<r0(p;-)tl0)!x0)  = s0(<r0_1  (I0)p!  xQ) 

can  be  shown  to  be  Borel  measurable  in  the  same  way  the  transition  kernel  t 
defined  by  (3.1)  was  shown  to  be  Borel  measurable.  Now  apply  Corollaries 
B.3.1  and  B.3.3  to  conclude  is  Borel  measurable. 

Theorem  6.3  (F+) (F“) (P) (N) (D)3 

If  fUTT'  and  ptP(S),  then 

jm(p)  = L JNf?.(yo),*)(p)(dyo)* 

*o 

Proof: 

We  show  by  induction  that  when  fr  t TT* , ptP(S),  k=0,...,N-1,  and 

X0  £ G)y  ,£^3 1 @>q  t • • • iX^  £ 6 » ^k  ^ 

(6.10)  Pk(^,p){cr0(p;  • )£X0,u0t£0 <rk(p;  • KXk,ukt£k} 


^The  assumption  (P),  (N),  or  (D)  is,  as  defined  in  Definition  6.3,  on  the 
(ISI)  model.  We  have  defined  the  (PSI)  model  in  such  a way  that  if  (P),  (N), 
or  (D)  holds  for  (ISI),  then  the  same  assumption  holds  for  (PSI).  The 
assumption  (F+)  or  (F“)  is  placed  on  both  models,  since  either  one  car.  satisfy 
such  a condition  and  the  other  violate  it. 
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P(«r,9(P))U0C0...IleCle} 


For  k=0 , 

po(ft,P){<r0(p;-)tIo,u0t£o}  = L S 

bo 
by  (6.1) 


S0J{<r0(p;-)£l0} 


h'^o  !<r0(p;z0))so(dzo!xo)P(dx0) 


= f jWy0Wp)(dy0)  by  (6-9) 

■*o 

= P0('fi,q>(p)){I0£0}  by  (6.8). 


Assume  (6.10)  holds  for  k.  Then 


pk+i  («,p){<r0(p;  • )tlQ, u0t£0 «rk+1  (p;  * )eXk+1  ,uk+1e£k+1 } 

^{<r0(p;  • )tl0>u0t£0 ,<rk(p;  • )Hk,uktfikJ  tk+/k+1  (£k+1  ,<T"o(P’  1q)  ’U°’ ' ‘ ' 

• • • ,uk,yk+1  ^bk^dyk+1 '^k^P^k^  ,uk^Pk^’p^  by  (6-1^  and  (6-1*) 

= ^XoSq.  .Xk£jlk+1^(£k+1  lyo’Uo“  • • »uk'yk+1)fck(dyk+l  lyk’uk)dPk("’*(P,) 

by  the  induction  hypothesis 


= pk+i(n,9(p)){l0C0--.Ik+1£k+l)  by  (6-8)- 


Now 


JN,nr(p)  = g(xk.uk)dpk(-.p)  by  <6-2> 

= klo^H  «k(<rk(p;ik)'uk)dPk(ft*p)  by  (6-5) 


N-1 


= lL^/v  r v A(yk’uk)dPk(S’*(p))  by  (6‘10) 


k=0  ^Y0C...YkC 


f N-1  w 

= f t I«kj 

Jy  J 


YoW0  JY(JC...^|<Cg'■<^K•“K)dPK(”■Py0)I'P<,,Hdy°, 
by  (6.8)  (and  the  monotone  or  bounded  convergence  theorem 


under  (p),  (N)  or  (D)) 


= \ dN  ,n^yo  ^P^ (<JyQ)  by  (3. 4)  (cf.  (3-3)  and  (6.8)). 


6.3.1  (F+)(F")(P)(N)(D) 


For  every  peP(S) , 


jJ(p)  < f jJ(y0)q>(p)(dy0). 
Yo 


Proof: 


The  function  JN  is  lower  semianalytic  (Corollary  3-2.1  or  Corollary 
4.5.1),  so  the  above  integral  is  defined.  For  plP(S), 

Jj(p)  = inf^J^p)  < inff£ff,JN)-(p) 

= inffr£TI,L  ^N,n(y0)9(P)(dy0)  by  Theorem  6-3 

*o 

= ^ JN^yo^P^dyo^  by  Corollary  3.2.4  or  Corollary  4.6.2. 


• 

We  now  show  that  and  correspond  in  the  same  way  that  - and 
JN  ^ correspond  in  Theorem  6.3. 

Lemma  6.1  (F+) (F") (P) (N) (D) 

For  tteTT,  ptP(S),  there  exists  fttTT  for  which 


JN,n(P)  = 5Y  JN,n{yo¥P)(dyo) 


Proof: 


Let  Tf  =(^0, . . . )c  TT  and  p«P(S).  Let  Qk(lt,p)  be  probability  measures 
on  ykCk,  k=0,...,N-1,  defined  on  measurable  rectangles  by 


1 


(6.11) 


Qk(TT,p){llf£k}  = Pk(TT,p){<r-k(p;  • )£Yk,ukt£k} . 


; 


1 0t) 

By  Theorem  C.2,  there  exist  Borel  measurable  stochastic  kernels  i?k(du,  |yk)  on 
C given  satisfying 

^.(TT.pja^)  = Iv  Ak(iik!yk)^(i',P)(dyk-c), 

-*-k 

for  every  y,  C*  k=° N“1- 

Then  for  k=0  , . . . ,N— 1 , 

1 = Pk(^P)l(ik>uk)£rk} 

= ^k(fT»p){(<rk(p; '),uk)tr^}  by  ( b.7 ) 

= V^P^) 

= Iyk/;k((|:;k)yk|yk)Qk(Tr>P)(dyk'C)' 

which  implies  ^k(  (r)ykJyk;=1  Qk(rr,p)-almost  surely,  fiedef ining  ^k(  • iykJ  on  a 
set  of  Qk(7r,p)  measure  zero  if  necessary,  we  can  assume  that  (b.12)  holds  and 
*“</V ' ' * ’An-I  )£  ^ • 

We  show  by  induction  that  for  X^  y,  £kE<8  c,  and  k=u,...,w-1, 

(0-1d-  l<k(TT.p)(Ik£k)  = Pk(n,<p(p)){yk£.Xk,uk£ilk}. 

For  k=0 , 

Q0(".P)(W  =1  Ao(£olyo),l(P)(<jyo->  by  (o-1^  (o.9),  (b.llj  and  (o.12) 

-*-o 

= P0(n,<p(p))ty0£lo,uot£o}  by  (6.b). 

Assume  (6.13)  holds  for  k.  Then 

Qk+1  (fr,P)(1k+1£k+l)  =IY  /1k+1tik+1lyk+1)Qk+1(nr*p)(dyk+rc)  Dy  (b-1i:) 

xk+1 

= ^ K+i  (p;  • A+1  (£k+1  l<rtt+1  (p;  ik+1  J )di>k+1  (,r  ,p) 


by  ( b. 1 1 ) 
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= L L rk+i(^k+iiyk+i)tk(dyk+i|,ric(p;ik)'uk)dPk(ji«p) 

Mk  -*-k+1 

by  (6.4)  and  the  consistency  of  the  P^Cn.pJ's 

= L r\y  Fk-i^k+iiyk-Ki^k^yk+ryk-^^Qk^-p) 

xku  xk+1 
by  (6.11) 

= L r Y ri  /;k+i(^k+i|yk+i)t(dyk+i!yk*uk)dpk(S’<p(p)) 
X0L'  ’ ’V*  ^ic+1 

by  the  induction  hypothesis 
= pk+i(^^(p)){yk+i£ik+i>uk+i^k+i}  fey  (6-8)- 


Finally, 

JN,n(P)  = ||foC‘kl6(xk>uk>dPk(lT>P) 

= ^“^^Sk^yk^k^^k^'^^P^  by  (6.5),  (6.11)  and  (6.13) 

= IYotJ|oo‘k{sk(yk>uk)dPk(f-Py0)](«>(p)(dyo) 

by  (6.8)  (and  the  monotone  or  bounded  convergence  theorem  under 
(P),  (N)  or  (D)) 

= J dN,n(yo)^(p)(dy0)  by  (3-4)  (cf-  (3-3)  and  (6-8))-  oED 

Yo 

Theorem  6.4  (F+)(F_) (P) (N)(D)4 

For  pfP(S) , 

(6.14)  jJJ(p)  = f jJJ(y0)<p(p)(dy0). 

Yo 

Also,  if  nr  is  optimal,  q>(p)-optimal  or  weakly  <J» (p)-£-optimal  in  (PSI),  then  n 
is  optimal,  optimal  at  p or  fe -optimal  at  p,  respectively,  in  (ISI). 


4See  Footnote  3. 
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Proof: 

Equation  (6.14)  follows  from  Corollary  6.3.1  and  Lemma  6.1.  If  w is 
weakly  <p(p)-£-optimal  in  (PSI),  then  by  Theorem  6.3 

JN,n(p)  = L Vf^yoWp^yo) 

1o 

- fY  + e if  I •JN(yo)<*>(p)(c|yo)>~oo; 

*o 

<-1/fe  if  J jJ(y0)<?(p)(dy0)=-o0. 

Equation  (6.14)  implies  that  n is  e -optimal  at  p.  If  tr  is  optimal  or 
(f(p)-optimal  in  (PSI),  a similar  argument  using  Theorem  6.3  and  (6.14)  shows 
that  n is  optimal  or  optimal  at  p,  respectively,  in  (ISI). 

We  will  show  shortly  that  a statistic  sufficient  for  control  always 
exists  and,  indeed,  in  many  cases  can  be  chosen  so  that  (PSI)  is  stationary. 
The  existence  of  such  a statistic  for  (ISI)  and  the  consequent  existence  of 
the  corresponding  (PSI)  enable  us  to  utilize  the  results  of  Chapters  3,  4 and 
5.  For  example,  we  have  the  following  corollary  to  Theorem  6.4. 

Corollary  6.4.1  (F+) (F“) (P) (N) (D) 

If  a statistic  Z =(«r0, ...  ,<rN_1 ) sufficient  for  control  exists  for  (ISI),  then 
for  every  €>0,  there  exists  an  6-optimal  nonrandomized  policy  for  (ISI)  which 
depends  on  ik  only  through  <r-k(p; ik) , i.e.  has  the  form 

tt=  (j*0(p;<r0(p;i0)),...  ,/‘N_1(p;<rN_1(p;iN_1)))- 

Proof: 

Apply  Theorems  6.2  and  6.4.  QED 


feh 


1U^ 


The  other  specific  results  which  can  be  derived  for  (1S1)  from  Chapters  3 
5 are  obvious  and  will  not  be  exhaustively  listed.  We  content  ourselves 


with  describing  the  dynamic  programming  algorithm  over  a finite  norizon. 


by  Tneorem  3.2,  the  dynamic  program aing  algorithm  has  the  following  fora 
under  (F+)  and  (f“),  where  we  assume  (PS1)  is  stationary: 

(b. 1b)  J*(y)  = 0 for  every  y> 

(b.lo)  Jk+1(y)  = infu£r  tg(y,u)  +*J  Jk(y ' )t(dy*  iy  ,u)  1 , k=d,...,N-1. 

If  the  infimu-nin  (b.lo)  is  achieved  for  every  y and  k=0,...,h-1,  then  there 
exist  universally  neasurable  functions  such  tnat  for  every  y and 

k=0,...,N-1,  ^(y)try  and  ^<k(y)  achieves  the  infimum  in  (b.lo).  Then 
n:(^0, . . . ) is  optimal  in  (PSI)  (Theorem  3.3)  and  ('pa,r0>  • • • ) is 

optimal  in  (ISI)  (Theorem  b.4). 

In  many  cases  0"k+1  (p;  ik+ j ) is  a function  of  cr^p;^),  uk  and  zk+1  . The 
computational  procedure  in  such  a case  is  to  first  construct  (£Q, . . . ) via 

(6.15)  and  (b.lo),  tnen  compute  y0=o^(p;i0)  from  the  initial  distribution  and 
the  initial  observation,  and  apply  control  uo=/*o(y0)"  Given  yk,  uk  and  zk+1 , 
compute  yk+1  and  apply  control  uk+1  =/*k+1  ^yk+1  ^ » ^=0, . . . ,N-2.  In  tnis  way  tne 
information  contained  in  (p;ik)  has  been  condensed  into  yk.  This  condensation 
of  information  is  the  historical  motivation  for  statistics  sufficient  for 
control,  but  is  peripheral  to  the  theoretical  development  here. 

Turning  to  the  question  of  the  existence  of  a statistic  sufficient  for 
control,  it  is  not  surprising  to  discover  that  the  sequence  of  identity 
•nappi'igs  on  P(S)Ik,  k=0,...,N-1,  is  such  an  object.  Although  this  represents 
no  condensation  of  information,  it  is  sufficient  to  justify  our  analysis  t.ius 
far.  If  the  constraint  sets  fk  are  equal  to  lkC,  k=U,...,h-1,  then  the 


* 
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functions  mapping  PtiOl^  into  the  distribution  of  xk  conditioned  on  (p;ik), 
k=0,...,N-1,  constitute  a statistic  sufficient  for  control.  Ihis  statistic 
has  the  property  that  its  value  at  the  (k+1)-st  stage  is  a function  of  its 
value  at  the  k-th  stage,  >iy  and  zk+1  (cf.  (6.21)),  so  it  represents  a genuine 
condensation  of  information.  It  also  results  in  a stationary  perfect  state 
information  model  and,  if  these  distributions  can  be  characterized  by  a set  of 
parameters,  results  in  enormous  computational  simplification.  This  latter 
condition  is  the  case,  for  example,  if  it  is  possible  to  show  before-hand  that 
all  these  distributions  are  Gaussian. 

We  prove  these  facts  in  reverse  order. 

Lemma  6.2  There  exist  borel  measurable  stochastic  kernels  r0(dxQ ip; zQ)  on  S 
given  P(S)Z  and  r(dx|p;u,z)  on  S given  P(S)CZ  which  satisfy 

(0-17)  L so(io|xo)P(dxo)  = IX  ro(-Solp;zo)so(dzo|xo)P(dxo) 

■“o  b 

for  every  £QtG>  s,  £^£<8  z,  ptP(S),  and 

( t> . 18)  f s(ilu,  x)p(dx)  = ( f r(£!p;u,z)s(dz!u,x)p(dx) 

2.  JSJZ. 

for  every  ^£^5,  z,  ptP(i>)  and  utC. 

Proof : 

for  fixed  (p;u),  define  a measure  q on  5Z  oy  specifying  its  values  on 
measurable  rectangles  to  be 

q(££lp;u)  s(iiu , x)p(dx) . 

J2 

Then  q is  a Borel  measurable  stochastic  kernel  on  SZ  given  P(S)C  (Corollary 
B.3.1)  and  can  be  decomposed  into  r(dx|p;u,z)  and  q(S‘dzlp;u)  (Theorem  C.2). 
Equation  (6.16)  follows  and  (6.17)  is  a special  case  of  (6.16).  QEO 


1 1 1 


It  is  customary  to  call  p,  the  given  distribution  of  xQ,  the  a priori 
distribution  of  the  initial  state.  After  zQ  is  observed,  the  distribution  is 
"up-dated",  i.e.  the  distribution  of  xQ  conditioned  on  zQ  is  computed.  The 


up-dated  distribution  is  called  the  a 


1 distribution  and  is  just 


r0(‘lp;z0).  At  the  k-th  stage,  k>1 , we  will  have  some  a priori  distribution 
pk  of  xk  based  on  ik_i  = (z0>uo»  • • • ,uk-2’zk-1  ^ • Control  uk_-|  is  applied,  some 
zk  is  observed,  and  an  a posteriori  distribution  of  xk  conditioned  on 
(ik-i >uk-l >zk^  is  computed.  This  distribution  is  just  r ( ' ! pk; uk_^ ,zk) . The 
process  of  passing  from  an  a priori  to  an  a posteriori  distribution  in  this 


manner  is  called 


Theorem  C.2  is  crucial  in  establishing  the 


filtering  equations  (6.17)  and  (6.18). 


Define  fu:P(S)  -^P(S)  by 


(6.19) 


fu(p)(£)  = jt(£!x,u)p(dx) , S.t6c 


Equation  (6.19)  can  be  termed  the 


If  xk  has  a 


posteriori  distribution  pk  and  the  control  uk  is  chosen,  then  the  a priori 
distribution  of  xk+1  is  fu  (pk).  This  will  be  up-dated  to  the  a posteriori 
distribution  as  soon  as  zk+1  is  observed  (cf.  (6.21)) 

The  mapping  (u,p)-^fu(p)  is  Borel  measurable  (Corollaries  B.3.1  and 
B.3.3).  Given  a sequence  ik*Ik  such  that  i[<+i  = (ii<»uk,zk+1 ) , k=0,...,N-2,  and 
given  pfP(S),  define  recursively 


(6.20)  pQ(p;i0)  = rQ(- ip;zQ), 

(6.21)  Pk+l (PI ik+l ) = r(* |fu  [pk(p;ik)l;uk.zk+1)»  k=0 , . . . , N-2 . 


Lemma  6.3  Let  ptP(S)  and  ft  = (^Q, ...  ) E IF  be  given.  Then  for  3kE  ®s, 


(6.22) 


Pk(n,pHxkt£k!ik}  = Pk(p;ikH£k) 


Pk ( fr , p ) -almost  surely,  k=0, . . . ,N-1 . 


Proof : 


We  proceed  by  induction.  By  definition 

^z0tZ0}Po(P:io)(2o)dP°(,T’P)  = ^ZotZ0}r°(5oIP;Zo)dPo(,r’P) 

* LL  r(20!p;z0)s0(dz0lx0)p(dx0) 

= \ SQC^IxQjpCdXQ)  by  (6.17) 

^o 

= P0(7r,p)(x0tZo,z0tZo} 

for  Sq*  @5,  Zq t 6 2>  and  30  Py  the  definition  of  conditional  probability 

PoU.pHxoiSoiio}  = P0(p;i0)(Z0) 


P0(’T,p) -almost  surely. 


Assume  that  (6.22)  holds  for  k.  For  ikf®i  > C_k t 0C * ^k+16®Z  and 

^k+1£  ftS’ 

L fT  u tr  _ ,7  ipk+1 (P;ik’uk,zk+1 ^ ^k+1 ^dPk+1 ^’P) 

ukf  xk>  VH’zk+1t4k+1 1 

= IfirT  it  /<?  \?  Pk+1(Piik'Uk-Zk+1)(Vl)s(dzk+1|uk-Xk+1) 

llktJ*'  **  ^k+l  -^k+l 

t(dxk+i  |xk,uk)^(duk!p;ik)dPk(ir,p)  by  (6.1) 

= fft  fT  Js  Ip  Jq  !?  ^k+1 (p’ ik,uk’ zk+1 ^ ^k+1 ^ dzk+1 ' uk’ xk+1 ^ 

ukEik'  bk  ^k  bk+1  ^k+1 

t(dxk+1  'xk.uk)/Ak(duk!p;ik)[pk(p;ik)(dxk)]dPk(fr,p) 
by  the  induction  hypothesis 
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Si"n  pt  Jr  ^ L pk+1  (p’ik’uk’zk+1  ^ (^k+1  )s(dzk+1  'uk,xk+1 } 
uktjLk*  ^ 6k  6k+1 

t(dxk+1  !xk,uk)[pk(p;ik)(dxk)]^k(duk!p;ik)dPk(Tr,p)  by  Fubini's  Theorem 

= I(i  n I L L r(Sk+1|fuk[pk(p;ik)];uk»zk+1)s(dzk+1|uk»xk+1) 
llk  -^k*  ^k  bk+1  ^k+1  K 

fuk^pk(p;ik)^dxk+1  )/lk^duk'p;ik^dPk(n,p)  by  (6*19)  and  (6-2D 

= f T Jr  J s(^k+1iuk'xk+1)fuk[pk(p;ik)](dxk+1) 

uk£jLk'  ^ ^k+1  * 

^Vduk'p;ik)dPk(n’p)  by  (6-l8) 

= L rT  Jr  L L s(^k+i|uk>xk+i)t(dxk+i|xk>uk)[Pk(p;ik)(dxk)] 

1 ^ \ — k+1 
^k(duk'p;ik)dPk(TT,p)  by  (6*19) 

= L . J.  L L s(ik+liuk.xk+1)t(dxk+1|xk-uk^k(duklpJik) 

’Y^k'  bk  % —k+1 

[pk(p;ikHdxk^dPk^TT,p^  by  Fubini's  Theorem 

= L fT  X L s(2k+1iuk.xk+i)t(dxk+ilxk»uk)^k(dukip;ik) 

uk£jLk'  ^k+1 

dPk(TT,p)  by  the  induction  hypothesis 

= Pk+1  (TT»p){lkc^k,uk£^k»xk+1£2k+1  ^k+l^+l  * * 

Therefore  (6.22)  holds  for  k+1.  QED 

Theorem  6.6  If  Tk=IkC,  k=0,...,N-1,  then  t ( P ; ) as  defined  by  (6.20)  and 
(6.21)  is  a statistic  sufficient  for  control  and  the  resulting  perfect  state 
information  model  is  stationary. 

Proof: 

Let  Yk  in  Definition  6.4  be  P(S),  k=0,...,N-1.  The  mapping  pQ  of  (6.20) 
is  Borel  measurable,  and  pk+1  is  the  composition  of  Borel  measurable  mappings 
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whenever  pk  is  Borel  measurable.  Thus  (pQ,  . . • >£>^_i ) is  a statistic. 

Condition  (a)  of  Definition  6.4  is  satisfied  when  k=0,...,N-1. 

We  verify  (6.4)  and  (6.5). 

for  ytp(S),  utC  andXt®p(s)>  define 

X(y,u,X)  = tztZ:  rf  ! f u ( y ) ;u,z)tU  , 

t(i!y,u)  = f [ s(X(y,u,X)  !u,x')t(dx' !x,u)y(i<). 

•'s's 

Note  that  Z.(y,u,X)  is  the  (y ,u)-sect ion  of  the  inverse  image  of  X under  a 
Borel  measurable  mapping,  and  the  measure  on  Z defined  by 

[ I s(Xiu,x')t(dx' |x,u)y(dx),  ££<f37, 

S S L 

depends  Borel  measurably  on  (y,u)  (Corollaries  B.o.1  and  B.3oK 
Consequently,  t can  be  shown  to  be  a borel  measurable  stochastic  kernel  on 
P(B)  given  P(S)C  by  the  same  argunent  as  was  used  to  show  the  state  transition 
kernel  t (Definition  3.2)  is  a Borel  measurable  stochastic  kernel. 

We  have  f or  rttTf , ptP(S),  and  k=0, . . . ,.N-2, 

pk+i(",pHpk+i(p;  pk(p:  ■)=y>,-vu} 

= Pk+1 (K,p){zk+1 ix(y,u,x)  I pk (p ; *)=y,uk=u} 
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Pk+1  (fr>P)-alraost  surely,  where  the  expectations  are  with  respect  to  Pk+1(ir,p). 
Thus  (6.4)  is  satisfied.  Note  that  t is  independent  of  k. 

For  k=0,...,N-1,  rrtTT  and  p£P(S),  by  Lemma  6.3 

E{g(xk,uk) lpk(p; * )=y,uk=u}  = jsg(xk,u)y(dxk)  = g(y,u) 

Pk+1  (ir,p) -almost  surely,  where  the  expectation  is  with  respect  to  Pk+1(rr,p). 
Note  that  g is  lower  semianalytic  (Theorem  2.4)  and  independent  of  k. 

Theorem  6 . 6 The  set  of  identity  mappings  on  F(S)Ik,  k=0,...,N-1,  is  a 
statistic  sufficient  for  control. 

Proof: 

Let  Yk  in  Definition  6.4  be  P(S)Ik  and  let  a~k  be  the  identity  mapping  on 
P ( S ) I k , k=0 , . . . ,N-1 . Condition  (a)  of  Definition  6.4  is  clearly  satisfied. 
We  verify  (6.4)  and  (6.5). 

For  irtTT,  ptP(S),  and  k=0,...,N-2,  the  distribution 

Pk+1(«,p){da-k+1!<rk(p;ik)=y,uk=u} 

depends  Borel  measurably  on 

Pk+1(n,p){dzk+1!<rk(p;ik)=y,uk=u}. 

But  for  Zi6z,  by  Lemma  6.3 

Pk+1  ’ ) = y>uk=u*  = f | s(Z!x'  ,u)t(dx'  |x,u)[pk(y)(dx)] 

S S 

pk+i  (^.pJ-almost  surely.  This  last  expression  is  Borel  measurable  in  (y,u), 


so  (6.4)  holds. 
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For  k=0 , . . . , N-2 , TCtTT  , 

(6.23)  E{g(xk,uk)  itrk(f 

Pk+1  (it, p) -almost  surely  and 
lower  semianalytic  in  (y,u) 


and  ptP(S),  by  Lemma  6.3 

1; ' )=y ,uk=u}  = j g(xk,u)tpk(y)(dxk)] 

s 

(6.5)  holds.  The  right  hand  side  of  (6.23)  is 
by  Theorem  2.4.  QED 


i 


117 


APPENDIX  A 
ANALYTIC  SETS 

This  appendix  summarizes  the  pertinent  facts  about  analytic  sets 
available  in  the  literature.  There  are  several  equivalent  definitions  of 
analytic  sets  in  use.  The  one  given  here  is  a variation  of  that  found  on  page 
15  of  [26] . 

Definition  A.  1 Let  N be  the  cross  product  of  countably  many  copies  of  the 
positive  integers.  Let  the  set  of  positive  integers  have  the  discrete 

topology  and  N the  product  topology.  A separable  metric  set  A is  analytic  if 
there  is  a continuous  function  f mapping  N onto  A. 

The  definition  in  [26]  requires  that  A be  embedded  in  a complete 

separable  metric  space.  Given  an  A satisfying  Definition  A.1,  it  can  be 
embedded  in  its  metric  completion  without  affecting  the  continuity  of  f,  so 
this  requirement  is  really  suDerfluous. 

Note  also  that  N as  defined  in  Definition  A.1  is  homeomorphic  to  N',  the 
set  of  irrationals  in  (0,1)  with  the  usual  topology  [19,  p.  25].  N could  be 
replaced  by  N'  in  Definition  A.1,  and  indeed  this  is  the  characterization  of 
analytic  sets  found  in  Section  39  of  [19]. 

Theorems  A.1  - A. 3 are  proved  in  Chapter  I,  Section  3,  of  [2o]. 

Theorem  A.  1 Let  X be  a Borel  space,  i.e.  a Borel  subset  of  a complete 
separable  topological  space.  Then  X is  analytic. 

Theorem  A. 2 The  countable  union,  intersection  and  cross  product  of  analytic 
sets  is  analytic. 


J 


r - 1 

1 10 

Theorem  A.  3 Let  A and  B be  analytic  subsets  of  Borel  spaces  X and  I 

respectively.  If  f is  a Borel  measurable  function  from  X to  Y , then  f(Aj  and 

_ 1 

f (B)  are  analytic. 

Corollary  A . 3 . 1 Let.  x ar.d  Y be  Borel  spaces  and  A an  analytic  subset  of  the 
Cartesian  product  XY.  Then 

proj^A  = {xtX:  for  some  y,  (x,y)<A} 

is  analytic. 

Proof : 

The  projection  mapping  is  Borel  measurable,  in  fact  continuous,  from  xY 
to  X.  C£D 


Definition  A. 2 Let  {A(n1 , . . . .n^)}  be  a system  of  sets  in  some  space,  where 
(n.| , . . . ,nk)  ranges  over  the  set  of  finite  sequences  of  positive  integers.  The 
set 


oo 


(J  f\  A(n, , . . . ,nk) 

( n ^ | y • • • ) K = 1 

is  called  the  result  of  operation  (A)  applied  to  the  system  {A(n1 , . .. ,nk)} . 
Definition  A. 3 The  system  {A(n1 , . . . ,nk)}  is  regular  if 

A (n-j , . . . , nk , nk+.j  )C  A (n^ , . . . , nk ) 
for  each  (n^,n2,...)  and  k. 

Definition  A. 4 A collection  J of  subsets  of  a space  is  invariant  under  the 
operation  (A)  if  whenever  each  of  the  sets  of  the  system  lA(n1 , . . . ,nk))  is 
ini,  the  result  of  operation  (A)  applied  to  the  system  is  inJ. 


119 


The  proof  of  the  next  theorem  can  be  found  in  [30],  Chapter  II,  Section 
5. 

Th&Qrem  k*JL  Let  X be  a Borel  space  and  m a measure  on  6X,  the  cr-algebra  of 
Borel  subsets  of  X.  Let  J be  the  completion  of  ®x  with  respect  to  m.  Then 
is  invariant  under  operation  (A). 

Theorem  A.. 5 Let  X be  a complete  separable  metric  space  and  let 

[A(n1 , . . . ,nk)}  be  a regular  system  of  closed  subsets  of  X such  that  for  each 
fixed  (n1 , n2,...),  diameter  (A(  n1 , . . . ,nk) ) — » 0 as  k-^oo  Then  the  result  of 
performing  operation  (A)  on  this  system  is  an  analytic  set.  Conversely,  every 
analytic  subset  of  X can  be  obtained  in  this  manner. 

See  [26],  Chapter  I,  Section  3,  for  a proof. 

Pefihitipn  A. .5.  Given  a Borel  space  X,  the  intersection  of  all  completions 
with  respect  to  finite  measures  of  the  Borel  cr-algebra  <3X  is  called  the 
universal  cr-algebra  ^x-  A set  in  is  said  to  be  universally  measurable. 

If  X is  a Borel  space  and  m is  a finite  measure  on  Gx,  then  m has  a 
unique  extension  to  a measure  on  We  denote  this  extension  by  m also, 

writing  m(E)  instead  of  m*(E)  when  Ef2Lx.  Likewise,  if  f is  a real-valued 
function  on  X,  measurable  with  respect  to^,  we  will  write  [f  dm  to  indicate 
the  integral  of  f with  respect  to  the  completion  of  m on<3x> 

Cprollary  A. 5 . 1 Let  X be  a Borel  space.  Every  analytic  subset  of  X is 
universally  measurable. 


Proof: 

Given  any  finite  measure  m on  8X  , let  -J(m)  denote  the  completion  of  f?x 
with  respect  to  m.  Let  A be  an  analytic  subset  of  X.  By  Theorem  A. 5,  A is 
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the  result  of  operation  (A)  applied  to  a system  of  sets  in  ^(m).  By  Theorem 
A.1*,  A is  in  ^(m).  QED 


A . 6 Given  a Borel  space  X,  the 


#x  is  the 


smallest  c~-algebra  in  X containing  the  class  of  analytic  sets. 


The  analytic  cn-algebra  is  strictly  larger  than  the  class  of  analytic 


sets,  since  the  complement  of  an  analytic  set  is  analytic  only  in  the  special 


case  that  both  are  Borel  [26,  p.  20].  Analytic  sets  which  are  not  Borel  do 


exist  [19,  p.  460]. 


We  have  the  following  selection  theorem  originally  proved  by  von  Neumann 


[22].  The  version  given  here  can  be  found  in  [6]. 


Theorem  A . 6 Let  X and  Y be  Borel  sets  and  A an  analytic  subset  of  XY . Then 


there  exists  a function  (f : pro jxA — =>Y  such  that  (x,<p(x))EA  for  every  xtDroj^A 
and  <p-1  (B)  t &x  for  every  Borel  subset  B of  X.1 


A. 7 The  analytic  subsets  of  a Borel  space  X coincide  with  the 


projections  on  the  X-axis  of  the  closed  sets  in  XN' , where  N'  is  the  set  of 


irrationals  in  (0,1)  with  the  usual  topology. 


Proof: 


By  Theorem  A. 3,  the  projections  on  the  X-axis  of  the  closed  sets  in  XN1 


are  analytic.  Now  let  A be  an  analytic  subset  of  X.  Then  A = f(N'),  where  f 


is  continuous,  and  so  the  mapping  (x,z) — » d(x,f(z))  is  continuous  from  XN'  to 


R,  where  d is  a metric  on  X consistent  with  its  topology.  The  inverse  image 


of  {0}  under  this  mapping,  which  is  {(x,z):  x=f(z)},  is  closed,  and  A is  the 


projection  of  this  set  on  the  X-axis.  QED 


This  latter  property  is  called  analytic 


(Definition  2.2). 
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APPENDIX  B 

MEASURABLE  SETS  OF  MEASURES 

This  appendix  collects  standard  results  about  the  space  of  probability 
measures  on  a Borel  space.  Some  variations  of  these  results  are  proved. 

Given  a Borel  space  X,  i.e.  a Borel  subset  of  a complete  separable 
topological  space,  denote  by  C(X)  the  set  of  bounded  continuous  real-valued 
functions  on  X and  by  P(X)  the  set  of  probability  measures  on  (X,fl  x).  If  d 

is  a metric  on  X consistent  with  the  given  topology,  then  Ud(X)  will  denote 

the  set  of  functions  in  C(X)  which  are  uniformly  continuous  with  respect  to  d. 

Following  [26],  Chapter  II,  Section  6,  define  a topology  on  P(X)  by 
taking  the  basic  open  sets  to  be  the  class  of  sets  of  the  form 

Vp(f1,...,fk;t1,...,ek)  = (atP(X) : i jfjdq  - {fjdpi <€it  i=1,...,k], 

where  f^CCX),  Gi>0,  i=1,...,k;  p t P ( X ) ; and  k is  a positive  integer.  We 
will  always  understand  P ( X ) to  be  equipped  with  this  topology  and  will  denote 
by  ® p ( x ) the  ^-algebra  generated  by  the  open  sets  in  P(X).  This  notation  will 

be  justified  by  Corollary  B.6.1,  which  states  that  P(X)  is  also  a Borel  space. 

It  follows  from  the  metrizability  and  separability  of  X that  PCX)  is 
metrizable  and  separable  [26,  Chapter  II]. 

Theorem  B . 1 Let  { pn > be  a sequence  in  PCX).  The  following  statements  are 
equivalent : 

(a)  Pn— 

Cb)  for  every  f in  C(X),  limn[f  dpn  = jf  dp; 

(c)  for  some  metric  d consistent  with  the  topology  on  X and  every  g in 

Ud(x),  UmnJg  dpn  = [g  dp; 


L. 
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(d)  lim  supn  pn(F)  <_  p(F)  for  every  closed  set  F; 

(e)  lim  infnpn(G)  p(G)  for  every  open  set  G; 

(f)  limnpn(B)  = p(B)  for  every  Borel  set  B whose  boundary  has  p measure 


Proof: 

The  equivalence  of  (a)  and  (b)  is  a simple  consequence  of  the  definition 
of  the  topology  on  P(X).  Clearly  (b)  implies  (c).  If  (c)  holds,  then  since 
the  topology  on  P(X)  depends  only  on  the  topology  of  X and  not  on  any 
particular  metrization,  (a)  must  hold  by  [26],  Chapter  II,  Theorem  6.1  and  the 
metrizability  of  P(X).  The  other  equivalences  follow  from  the  same  theorem. 


We  now  exhibit  a countable  basis  which  generates  the  topology  in  P(X). 


Theorem  B.2  Let  X be  a Borel  space.  Let  D be  dense  in  P(X).  Then  there 
exists  a sequence  { g i , » * - * } in  C(X)  such  that  the  topology  on  P(X)  is 
generated  by  the  class  of  basic  open  sets 

IA3  = { Vp(g1 , . .. ,gk;  £ 1 , . . . ,£k) : p£D,  £^>0  rational,  k a positive  integer}. 

Proof: 

By  [26],  Chapter  II,  Theorem  6.6,  there  is  a sequence  {g1,g2>***}  of 
functions  in  C(X)  such  that  whenever  { Pn } is  a sequence  in  P(X),  pn— * p if  and 
only  if  j?^dpn->  lgkdp  for  every  k* 

Let  be  the  standard  topology  on  P(X),  i.e.  the  topology  generated  by 


^2  = { vp(*i  * • • • >?k*  6k) : £^>0,  k a positive  integer}, 
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and  take  ^ to  be  the  topology  for  which  is  a basis.  Denote  by  the 
topology  generated  by  U^.  Then 

If  G t 'D 2 and  qtG.  there  is  a neighborhood  Vp  (g-|,...,?k;  £.|,...,£k) 

containing  q and  contained  in  G.  Let  {pfl}  be  a sequence  in  D with  pn— *pQ  in 
the  31  topology.  By  Theorem  B.1(b),  jgidPn-^  Igidpo’  i=1»-..»k,  and  so  there 
exists  an  index  nQ  such  that 

‘Igidpn0  " Jgidpo!  < 1/2^i  “ ljsidpo  ~ ^i^q ! ] , i=1,...,k, 

Choose  £ | rational  such  that 

'jgidpn0  - ^S^Pq!  + i[gidp0  - jgjdqi  < £ 1 < 1/2[€_i  + ijgjdpQ  -jgjdql], 

i=1 , . . . ,k . Then 

QtVp  ( 5 -|  > • • • » gk  J , . . . ,£k )C-Vp  (g 1 1 . . • ) gk  I €_i  ,...,£ k ) , 

and  so  GeG^.  Therefore 

Let  G be  in  ^ . Then  X-G  is  closed  relative  to  ^ . Let  p be  a limit 
point  of  X-G  relative  to  32»  i*e*  there  exists  a net  {p^}  in  X-G  such  that 
Prt — )p  relative  to  32.  Then  for  every  k, 

Igkdp^  Igkdp 

This  net  contains  a sequence  {pnJ  for  which 

I gkdpn~^  Jgkdp 

for  every  k,  and  by  choice  of  the  gk's,  pn — »p  relative  to  . Therefore  p is 
a limit  point  of  X-G  relative  to  and  so  is  in  X-G.  This  proves  X-G  is 
closed  relative  to  ^ and  Gk32  • follows  that  = 32*  QED 
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Theorem  B.  1 Let  X be  a Borel  space  and  1 a class  of  subsets  of  X which 
generates  the  w_algebra  <8X-  Then  ^p(x)  is  the  smallest  cr-algebra  on  P(X)  with 
respect  to  which  the  mappings 

9g : p — » p(B) , BtU, 

are  measurable. 

Proof: 

Let  Z*  be  the  smallest  <r-algebra  for  which  the  mappings  0g,  Bt>  , are 
measurable. 

Let  t be  the  class  of  Borel  sets  E in  X for  which  0E  is  ^p(x)  measurable. 
We  show  £ is  a Dynkin  system: 

(a)  The  space  X is  inf.,  since  p(X)  = 1 for  every  p. 

(b)  If  A, Bet  and  BCA,  then  for  0 the  set  of  rational  numbers  and  c real, 

{p:  p(A-B)  < c}  = U { p : p(A)  < c+r,  p(B)  > r} 
riQ 

is  ^p(x)  measurable.  Therefore  A-BtL 

(c)  If  A1 , A2>...  are  inf  and  At  A,  then 

{p:  p( A)  < c}  = C\  {p:  p(An)  < c} 
n=  1 

ia  ^p(X)  measurable.  Therefore  Ae£. 

This  shows  £ is  a Dynkin  system. 

Now  let  F be  a closed  subset  of  X and  let  d be  a metric  on  X consistent 
with  its  topology.  If  F=X,  then  Ft£.  If  F*X,  define 

Fn  = {x:  d (x, F ) > 1/n},  n=1,2,..., 

where  d(x,F)  = infy£pd(x,y) . For  n sufficiently  large,  Fn  is  nonempty. 


125 


When  Fn  is  nonempty,  the  nonnegative  function 

fn(x)  = d(x,Fn)/[d(x,Fn)  + d(x ,F) ] 

is  in  C(X),  is  identically  one  on  F and  is  identically  zero  on  Fn.  For  n 
sufficiently  large,  choose  ptP(X)  which  assigns  mass  one  to  some  element  in 
Fn.  For  c>0, 

{q:  jfndq  < c}  = Vp(fR;  c) 

is  open  in  P(X),  so  the  mappings  q— ^ jVndq  are  ^-measurable.  The  mapping 
9p  is  a monotone  limit  of  these  mappings,  so  Ft  f . 

The  class  of  closed  subsets  of  X is  closed  under  finite  intersections  and 
is  contained  in  £ . By  the  Dynkin  system  theorem  [1,  Theorem  4.1.2],  £.  = <8X. 
This  proves  Z*cd3p(X^ . 

To  prove  the  reverse  containment,  it  suffices  to  prove  that  , where 

is  the  countable  basis  for  the  topology  on  P(X)  defined  in  Theorem  B.2.  To 
show  this,  it  suffices  to  prove  that  the  sets  of  the  form  Vp(g;e)  are  in£*. 
But 

Vp(g;  e ) = {q:  ijg  dq  - jg  dp!  < £}, 

and  since  q— >Jg  dq  is  I1*  measurable  [9,  (2.2)],  Vp(g;c)  is  also.  QED 

Corollary  B.^.l  Let  X be  a Borel  space  and  (A,;t  ) a measure  space.  Let  (p 
be  a measurable  map  from  (A,  A ) to  P(X)  and  let  f be  a function  f rom  fix  to 
the  extended  real  numbers,  measurable  with  respect  to  the  product  c-algebra 
Assume  f is  bounded  or  f is  nonnegative.  Then  the  mapping 

UJ  [ f(to,x)  <p  (u>)  (dx) 

JX 
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is  measurable  from  (A,  A)  to  the  extended  real  numbers. 

Proof: 

If  f is  bounded,  this  follows  from  Theorem  B.3  and  [ 9 1 , (2.2).  If  f is 
nonnegative,  let  (fn)  be  a sequence  of  bounded  measurable  functions  converging 
up  to  f.  The  mappings  ]/n(w,x)  <p  ( c*j)  ( dx ) are  measurable,  so  their  limit  is 
also.  QED 


Corollary  B.3.2  Let  X be  a Borel  space  and  f a measurable  function  from  X to 
the  extended  real  numbers.  Assume  f is  bounded  or  f is  nonnegative.  Then  the 
mapping 

p — =>  |f  dp 

is  measurable  from  P(X)  to  the  extended  real  numbers. 

Proof: 

Let  -fl  = P(X) , A = @p(x) i and  $(p)  = P in  Corollary  B. 3- 1 . CEB 

Corollary  B . 3 . 3 Let  X be  a Borel  space,  3-  a class  of  subsets  of  X which 
generates  the  o~-algebra  6>x,  and  (fl,  A ) a measure  space.  A mapping  from 
(/l, A ) to  P(X)  is  measurable  if  and  only  if  the  mappings 

w — » <p  ( to)  ( B ) , BC3-, 


are  measurable. 


Proof: 

The  mapping  oj-^<p(w)  (B)  is  0g«  <$>.  If  ($>  is  measurable,  the  composition  is 
also.  If  the  composition  is  measurable,  then  <$)_1  {9g-1  [0,c) } is  in  A for  each 
c>0  and  Bc>.  Since  the  sets  of  the  form  0g-1[O,c),  Be3,  generate  &p(x)> 
is  measurable.  QED 


7 
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Lemma  B . 1 Let  D be  a dense  subset  of  a metric  space  X and  g a uniformly 
continuous  function  from  D into  a complete  metric  space  Z.  Then  g has  a 
unique  extension  to  a continuous  function  on  X. 

Proof: 

For  x^D,  let  {x^lcD  be  such  that  xk— »>x.  By  the  uniform  continuity  of 
g,  { g( xk) } is  Cauchy  in  Z,  so  g(x)  can  be  defined  as  limkg(xk).  Use  the 
uniform  continuity  of  g again  to  show  this  extension  is  well-defined.  It  is 
clearly  continuous  and  unique.  QED 

Definition  B.1  Let  X and  Y be  Borel  spaces  and(f:X — »Y  be  a Borel  measurable 
one-to-one  function  such  that$(X)  is  in  By  and  (JT1  is  Eorel  measurable.  Then 
is  said  to  be  a Borel  isomorphism,  and  X and  <J)(X)  are  Borel  isomorphic . 

As  a result  of  the  following  theorem  [26,  Chapter  I,  Corollary  3-3],  we 
need  only  check  that  in  Definition  B.1  is  Borel  measurable  and  one-to-one  to 
conclude  that  it  is  a Borel  isomorphism. 

Theorem  B.4  (Kuratowski  Theorem)  If  X and  Y are  Borel  spaces  and$:X — >Y  is 
Borel  measurable  and  one-to-one,  then  <j)(X)  is  Borel  in  Y and  is  Borel 

measurable. 

Theorem  B.5  Let  X and  Y be  Borel  spaces  and  9:X— >Y  a homeomorphism. 1 Then 
P(X)  is  homeomorphic  to  a Borel  subset  of  P(Y). 

Proof: 

Define  <^:P ( X ) — » P(Y)  by 

homeomorphism  is  a topology-preserving  one-to-one  mapping  of  one 
topological  space  into  another.  We  do  not  require  it  to  be  onto. 


■■■■■niiaiiiaMMMHiMteiMM 


128 


q>(p)(B)  = p(e_1(B)),  Btey. 

The  mapping  <P  is  a one-to-one  mapping  of  P(X)  onto  (qcP(Y):  q(0(X))=1)  and 

this  is  in  ^p(x)  by  Theorems  B.3  and  B.M. 

If  {pn)  is  a sequence  in  P ( X ) converging  to  ptP(X),  and  f is  in  C(Y), 
then  ft>0  is  in  C(X)  and 

j f d^(pn)  =J  (fo0)dpn-»J  (f.e)dp  = J f d<p(p), 

I A A I 

so  (J)  is  continuous. 

Let  d be  a metric  on  Y consistent  with  its  topology  and  define  a metric 
d'  on  X by 


d'Cx^Xg)  = d(6(x1 ) ,0(x2) ) , x1(x2tX. 

Since  9 is  a homeomorphism,  d*  is  consistent  with  the  topology  on  X.  Let  g' 
be  in  Ud,(X).  Then  g=g'o9_1  is  in  Ud (©( X ) ) . By  Lemma  B.1  and  the  Tietze 

extension  theorem  [11],  g can  be  extended  to  g*t  C(Y).  If  { pn } is  a sequence 
in  P(X)  and  for  some  pc P(X) , <p(  Pn)— ) <{> (p)  in  P(Y),  then 

if*  = JQ(X)S  d(^(pn)  = [Y«#d<P(Pn)"J)  lYg#d<P(P)  = /xg'dP. 
so(J>-1  is  continuous.  QED 

Theorem  B.6  If  Y is  a complete  separable  space,  then  P(Y)  is  a complete 
separable  space. 


See  [26],  Chapter  II,  Theorems  6.2  and  6.5  for  a proof. 


Corollary  B.6.1  If  X is  a Borel  space,  then  P ( X ) is  also. 


Proof: 
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By  Theorems  B.5  and  B.6,  P(X)  is  homeomorphic  to  a Borel  subset  of  a 
complete  separable  space.  QED 

Theorem  B.7  Let  X be  a Borel  space  and  X = { px : xCX},  where  px  is  the 

probability  assigning  mass  one  to  x.  Then  X is  homeomorphic  to  X and  X is  a 
Borel  subset  of  P(X). 

Proof: 

Define  6(x)  = py.  Let  xk  — » x and  G be  open  in  X.  If  xtG,  then  for 
sufficiently  large  k,  xkcG  and 

lira  infk  px^(G)  = 1 = PX(Q) • 

If  x(G , then 

lim  infk  px^(G)  — G = px^G^ ' 

Therefore  pv— 1 » pv  and  © is  continuous  by  Theorem  B.1(e). 
xk  x 

Now  let  p be  a sequence  converging  to  p in  X.  Let  G be  an  open  set 
xk  x 

containing  x.  By  Theorem  B.1(e), 

lim  infk  px^(G)  — px^  = 1 ’ 

and  so  xktG  for  all  k sufficiently  large.  This  implies  xk— » x and  6"1  is 
continuous. 

The  set  X is  Borel  by  Theorem  B.4.  QED 

We  separate  out  a part  of  the  proof  of  Theorem  B.7  as  a corollary. 

Corollary  B.7.1  Let  X be  a Borel  space.  The  mapping  x— »px  is  continuous 

J 

A 


from  X to  P(X) . 
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We  now  prove  a lemma  (given  here  as  Lemma  B.4)  found  in  [6],  The  proof 
given  there  contains  an  error  which  is  corrected  below. 

If  X is  a compact  metric  space,  2 is  the  set  of  (possibly  empty)  closed 
subsets  of  X.  If  d is  the  metric  in  X and  K1  and  K2  are  nonempty  closed 
subsets  of  X,  we  define  the  Hausdorff  metric  [15,  Section  28]  by 

d(K1(K2)  = max  [supXLf^d(x,K2) , supxtK^d(x,K1 )} , 

where 


d(x,K)  = infycRd(x,y) . 

If  K-j=0  and  K2  is  a nonempty  closed  subset  of  X,  define 

d(K1,K2)  = d(K2,K1)  = diameter(X) , 


I 


where  diameter(X)=supx  ytXd(x,y)  is  finite  because  X is  compact.  Thus  the 

V 

empty  set  is  an  isolated  element  of  2 . This  metric  gives  rise  to  the 
exponential  topology  on  2X  ([19],  Section  17;  [20],  Section  43)  in  which  2X 
is  a compact  metric  space. 

We  denote  by  N the  countable  cross  product  of  the  set  of  positive 
integers.  Let  the  set  of  positive  integers  have  the  discrete  topology  and  N 
the  product  topology.  The  space  N has  a metrization  consistent  with  this 
topology.  If  z is  an  element  of  N,  zi  will  be  its  i-th  component.  A sequence 
in  N will  be  indicated  by  z(1),  z(2),  .... 


We  define  functions  from  N to  the  power  set  of  N by 


L(z)  = [wtN:  < z ^ 

= zi’ 


i= 1 ,2 , . . . } 
i=1,2,...,k}. 


Nk(z)  = [wcN: 


131 


For  each  ztN,  L(z)  is  a compact  set  by  Tychonoff's  Theorem. 

Lemma  B.2  Let  f be  a continuous  function  from  N to  a metric  space  X.  For 
ztN, 

suPy£L(z)diameter(fCNk(y)])^0  as 

Proof: 

Suppose  the  contrary.  Since  Nk(y)  Nk+1  (y) , the  expression  above  is 
monotone  decreasing  in  k.  For  some  £ >0  and  every  k there  is  a ykcL(z)  such 
that 

diameter (f[Nk(y)  ] ) >.£. 

Let  IT k = {yeL(z):  diameter  (f[Nk(y)  j ) >_  £ } . Then  . and  TT^  i 0 

for  each  k . 


Since  Nk(y)  depends  only  on  the  first  k components  of  y,  the  set 
{ y c N : diameter (f[Nk(y) ] ) <£} 


is  open  in  N,  and  so7Tk  is  closed,  therefore  compact.  It  follows  that  there 

oo 

exists  y-cOTT,,.  For  each  k,  there  exists  z(k)  £ N,_(y*)  with 
k=1  K K 

d(f(z(k)),f(y*))  l£/3. 

But  z(k) — i y*  and  f is  continuous.  This  leads  to  a contradiction.  QED 

Lemma  B.3  Let  X be  a complete  separable  metric  space  and  f a continuous 
function  from  N to  X.  For  ptP(X), 

p[f (N) ] = supztNp(f[L(z)]) . 

Proof: 
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Since  for  each  z,  L(z)  is  compact,  f(L(z))  is  compact  and  thus  in®  x*  By 
Corollary  A.5.1,  the  p measure  of  f(N)  can  be  defined. 

For  each  z, 


f(M)3f[L(z)], 


so 


p[f (N) ] i.  supztNp(f[L(z)]) . 

Define 


N(  n-, , . . . ,nk)  = {wtN:  w1<.n1 , . . . ,wJc<nk} , 

A(n1f...,nk)  = f[N(n1 , ,nk)] . 

Since  N(n1,...,nk)  is  open  in  N,  it  is  Borel  and  A(n1,...,nk)  is  analytic. 
The  increasing  sequence  {N(n^ , . . . ,nk , 1 ) , N(n1 , . . . ,nk,2) , ...}  converges  to 

N(n1 , . .. ,nk_1 ) , so  the  sequence  {A(n1 , . .. ,nk,1 ) , A(n1 , . . . ,nk,2) , ...}  is  also 
increasing  and  converges  to  A(n1 , . . . ,nk_1 ) . The  sequence  { A ( 1 ) , A(2),  ...} 
increases  to  f(N) . 

Given  OO,  construct  a sequence  z = (z^,  z2>  ...)  of  positive  integers 
for  which 


p[ f (N)  ] £ p[  A(z , ) ] + t/2, 
p[A(z1 , ... ,zk_1 )]  1 p[A(z1,...,zk)]  + G/2k. 

Then  f (N) D A(z1 ) D A(z1 , z2)z>  . . . , and  defining 

© o 

A ( z ) « (\  A(z . , . . . , zu ) , 
k=1 


we  have 
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pCf CN>  3 1 p[ A ( z ) ] + € . 

To  conclude  the  proof,  we  show  A(z)  = f[L(z)]. 

Clearly  A(z)3f[L(z)],  so  we  show  the  reverse  containment.  Suppose 
xcA(z).  Then  for  each  k,  there  exists  y(k)  i N(z1 , . . . , z^)  such  that  x = 
f(y(k)).  The  sequence  of  positive  integers  { y ( 1 ) i , y(2)  1 , ...}  takes  some 

value  w1  between  1 and  z1  infinitely  often.  Let  1^  be  an  infinite  index  set 
such  that  y ( j ) i = w1  for  jtl^.  The  sequence  { y ( j ) 2 : takes  some  value 

w2  between  1 and  z-,  infinitely  often.  Let  I2CI1  be  an  infinite  index  set 
such  that  y(j)2  = w2  for  jtl2.  Continuing  in  this  manner,  construct  a 
sequence  of  infinite  index  sets  ...  and  wcL(z)  such  that  y ( j ) k = 

wk  for  jElk,  k=1,2,....  Choose  m2fl2,  ...  such  that  m1  < m2  < .... 

The  sequence  y(m.|),  y(m2),  ...  converges  to  w,  and  since  f is  continuous,  x = 
f(w).  Therefore  xcf[L(z)].  QED 

Lemma  B.4  Let  X be  a compact  metric  space  and  A an  analytic  subset  of  X (see 
Appendix  A).  Let  c be  a real  number.  Then 

{pcP(X):  p( A)  > c} 

is  analytic. 

Proof: 

Since  A is  analytic,  there  is  a continuous  function  f:N— »X  such  that 
f(N)  = A.  We  show  that  the  function 

z f[L(z)  ] 

Y 

is  continuous  from  N to  2 . 


w 

r 
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Note  that  since  L(z)  is  compact  and  f is  continuous,  f[L(z)]  is  in  2X. 
Let  z(n)— »z  in  N.  Choose  G>0.  By  Lemma  B.2,  there  exists  k such  that 

suPyCL(z) diameter (f[Nk(y) ] ) <£. 

There  exists  Nk  such  that  n>Nk  implies  zCn^  = z^,  i=1,2,...k.  Suppose 

ytL(z) . Let  y(n)i  = min  {z(n)i(  y± } . For  n>Nk,  y(n)  c Nk(y)  and 

d(  f (y) , f (y(  n) ) ) < €, 

where  d is  the  metric  on  X.  Therefore 

(B.1)  suPXcf[L(z)  ]d(x,f[L(z(n))])— > 0 

as  n-»co. 

For  w( n)  t L(z(n) ) , define  wtL(z)  by 

= min  {w(n)it  zi } . 

Then  as  before  for  n>Nk,  we  have  w(n)cNk(w)  and 

d(f(w(n)),f(w))  < £. 


Therefore 

(B.2)  suPxntf[L(z(n))3d(xn'f[L(z)])-J»  0 

as  n-*oo . Relations  (B.1)  and  (B.2)  imply  f[L(z( n) ) ] — * f[L(z) ] in  2X  and  this 
establishes  continuity. 

By  [9,  (3.8)],  the  mapping 


(p,K)— <>  p(K) 


r 
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is  upper  semicontinuous  from  P(X)2  to  [0,1].  Therefore  the  composition 

(p,z)— *p(f[L(z)]) 

is  upper  semicontinuous,  i.e. 

B(c)  = {(p,z):  p( f[L(z) ] ) A c) 
is  closed  in  P(X)N  for  each  real  c.  Then 

oo 

B = {(p,z):  p( f[L(z) ] ) > c}  = U B(c  + 1/k) 

k=1 

is  Borel  in  P(X)N,  and  by  Lemma  B.3 

(ptP(X):  p( A)  > c} 

is  the  projection  of  B on  the  P(X)-axis.  This  projection  is  analytic  by 

Theorem  A. 3-  QED 

Note  that  by  Theorem  A. 2,  the  following  statements  are  equivalent: 

(a)  [p:  p(A)  > c}  is  analytic  for  each  real  c; 

(b)  { p : p( A)  >_  c]  is  analytic  for  each  real  c. 

We  will  use  (a)  and  (b)  interchangeably. 

We  now  extend  Lemma  B.H  to  the  noncompact  case. 

Theorem  B.8  Let  X be  a Borel  set  and  A an  analytic  subset  of  X.  Let  c be  a 

real  number.  Then  {pcP(X):  p(A)  > c)  and  (pcp(X):  p(A)  2.  cl  are  analytic. 

Proof: 

By  Urysohn's  Theorem  [11,  Chapter  IX,  Corollary  9.2],  there  is  a 
homeomorphism  9 which  embeds  X in  a compact  metric  space  Y.  Let  <J>:P(X)— * P(Y) 
be  the  homeomorphism  defined  in  Theorem  B.5.  By  Theorem  A. 3,  0(A)  is 


L 
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APPENDIX  C 
STOCHASTIC  KERNELS 

Definition  C.1  Let  (X,A)  and  (Y,0)  be  measure  spaces.  Then  q(dy|x)  is  an 
^ -measurable  stochastic  kernel  on  Y given  X if 

(a)  q('|x)  is  a probability  measure  on  (Y,6)  for  each  x; 

(b)  q(El')  is* -measurable  for  each  EeG. 

When  there  is  no  possibility  of  confusion,  q will  be  called  simply  a 
stochastic  kernel.  If  Y is  a product  space  Y = WZ,  we  will  write  q(dw’Z,x)  to 
denote  the  stochastic  kernel  which,  for  each  x,  is  the  marginal  of  q(dy,x)  on 
W.  Similarly  the  marginal  on  Z will  be  represented  by  q(W'dzix).  If  p is  a 
measure  on  (Y,6),  we  will  write  p(dw'Z)  and  p(W'dz)  to  indicate  the  marginals 
of  p on  W and  Z respectively. 

The  two  major  existence  theorems  concerning  stochastic  kernels  follow. 
The  first  is  well-known  and  states  that  in  a product  of  Borel  spaces,  any 
measure  can  be  decomposed  into  its  marginal  and  a stochastic  kernel.  The 
second  is  a generalization  of  the  first  to  include  a measurable  dependence  on 
a parameter. 

Theorem  C.1  Let  X and  Y be  Borel  spaces  and  p an  element  of  P(XY).  Then 
there  exists  a stochastic  kernel  q(dyix)  such  that 

PUD  = J q(Xix)  p(dx-Y), 

X 

for  every  JU6X,  it©y 
Proof: 

This  is  an  easy  consequence  of  [1],  Theorems  6.6.5  and  6.6.6.  As  a 
result  of  these  theorems,  given  ptP(XY)  and  a <J'-algebra  ^c.®xy*  there  exists 
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a regular  conditional  probability  Q((x,y),B),  i.e.  a function  on  XY®Xy  which 
is  A -measurable  in  (x,y)  for  fixed  Bt®  an<^  a probability  measure  on 
(XY,«Xy)  for  fixed  (x,y).  If  £>  is  the  <r-algebra  of  sets  of  the  form  XY , 
XE&X*  then  Q((x,y,),B)  is  independent  of  y.  Define 

q(Xix)  = Q((x,y)tXX), 

where  y is  arbitrary.  Then  by  the  defining  property  of  conditional 
probabilities 

P(XX)  = [ Q((x,y),XI)  p(d(x,y) ) = [ q(l!x)  p(dx‘Y) 

JiY  J1 

for  Xt®x,  Xtfiy.  QED 

Theorem  C.2  Let  X and  Y be  Borel  spaces  and  {Ci,A)  a measure  space.  Let 
p(d(x,y ) !«*»)  be  a stochastic  kernel  on  XY  given  0 . Then  there  exists  a 
G yA  -measurable  stochastic  kernel  q(dy[x,«o)  such  that 

p(iYV)  = J qd'x.uj)  p(dx-Y'<o) 

X 

for  every  Jtifiy,  Xt  ® y. 

Proof: 

The  theorem  follows  from  [36],  Lemma  A.  1.9,  when  X and  Y are  the  real 
line.  This  can  be  extended  to  allow  X and  Y to  be  Borel  subsets  of  the  unit 
interval.  By  Theorem  B.4  and  [1],  Theorem  6.6,  every  Borel  space  is  Borel 
isomorphic  to  a Borel  subset  of  the  unit  interval.  QED 

Theorem  C. 3 Let  X be  a Borel  space  and  (0,4  ) a measure  space.  A function 
q(X!*>)  on  ®XQ  is  an  A -measurable  stochastic  kernel  if  and  only  if  for  each  w, 
q(  * !*>)  £ P(X)  and  the  mapping 


fa. 


is  A -measurable. 
Proof: 


This  is  a direct  consequence  of  Corollary  B.3.3.  QED 
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